WO2020211783A1

WO2020211783A1 - Adjusting method for quantization frequency of operational data and related product

Info

Publication number: WO2020211783A1
Application number: PCT/CN2020/084943
Authority: WO
Inventors: 刘少礼; 张曦珊; 曾洪博; 孟小甫
Original assignee: 上海寒武纪信息科技有限公司
Priority date: 2019-04-16
Filing date: 2020-04-15
Publication date: 2020-10-22

Abstract

The present disclosure provides an adjusting method for quantization frequency of operational data and a related product. The method comprises adjusting the quantization frequency of a neural network. Thus, the present disclosure has the advantage of high calculation accuracy.

Description

Quantitative frequency adjustment method of calculation data and related products

Technical field

The present disclosure relates to the field of neural networks, and in particular to a method for quantifying frequency adjustment of arithmetic data and related products.

Background technique

Artificial Neural Network (ANN) is a research hotspot that has emerged in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes a simple model, and composes different networks according to different connection methods. In engineering and academia, it is often referred to as neural network or quasi-neural network. A neural network is a computing model, which is composed of a large number of nodes (or neurons) interconnected. The calculations of existing neural networks are based on CPU (Central Processing Unit) or GPU (English: Graphics Processing Unit) to realize neural network inference (that is, forward operation) and reverse training. In order to reduce the amount of calculation, the calculation data will be quantized, but the existing quantization processing will not change the frequency of quantization, so that the quantization parameters may not match the calculation data, resulting in the effect of quantization, affecting the accuracy or calculation of the calculation the amount.

Summary of the invention

The embodiment of the disclosure provides a method for adjusting the quantization frequency of arithmetic data and related products, which can improve the accuracy of calculation or reduce the amount of calculation.

In a first aspect, a method for adjusting the quantization frequency of arithmetic data is provided. The method includes the following steps:

A method for adjusting the quantization frequency of arithmetic data is characterized in that the method is applied to an artificial intelligence processor, and the method includes the following steps:

Determine the calculation data of the neural network;

Acquire a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;

The training parameters of the neural network are acquired, and the adjustment frequency of the quantization accuracy is determined according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.

Optionally, the calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative

Output neuron derivative

Weight derivative

One or any combination of;

The data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.

Optionally, the training parameters include: training period iteration or training period epoch.

Optionally, the determining the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy specifically includes:

Adjust the quantization accuracy every δ training cycles, and δ is fixed;

Or adjust the quantization accuracy every δ training period epoch, δ is fixed;

Or adjust the quantization accuracy every step iteration or epoch, where step=αδ, α is greater than 1;

Or adjust the quantization accuracy every δ training iterations or epochs, and the δ gradually decreases as the number of training increases.

Optionally, the method for determining the quantization accuracy specifically includes:

Determine the exponent s of the discrete quantization accuracy or the continuous quantization accuracy f according to the maximum absolute value of the operation data;

Or determine s and f according to the minimum absolute value of the calculated data;

Or determine s and f according to the quantitative precision relationship of different data;

Or determine s and f based on empirical constants.

Optionally, the determining s or f according to the maximum absolute value of the operation data specifically includes:

For discrete quantization accuracy, determine s by formula 1-1;

S _a =[log ₂ (a _max )—bitnum+1] formula (1-1)

For continuous quantization accuracy, determine f by formula 2-1

Where c is a constant, bitnum is the number of bits of the quantized data, and amax is the maximum absolute value of the calculated data.

Optionally, the determining s and f according to the minimum absolute value of the operation data specifically includes:

If it is the discrete quantization accuracy, determine the accuracy s by formula 1-2;

If it is continuous quantization accuracy, determine the accuracy f by formula 2-2;

f _a ＝a _min *d Formula 2-2

Among them, d is a constant and amin is the absolute minimum value of the calculated data.

Optionally, the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:

Use all layer classifications to find the absolute maximum value or the absolute minimum value;

Or use hierarchical classification to find the absolute maximum value or the absolute minimum value;

Or use hierarchical classification and grouping to find the absolute maximum value or the absolute minimum value.

Optionally, the determining s and f according to the quantization precision of different data types specifically includes:

For discrete quantization accuracy, use formula 1-3 to determine the discrete fixed-point accuracy

S _a ^(l) =∑ _b≠a α _b S _b ^(l) + β _b Formula 1-3

among them,

Discrete fixed-point accuracy of the data a ^(l) in the same layer as the data b ^(l) of the

A known;

For continuous quantization accuracy, use formula 2-3 to determine the continuous quantization accuracy

f _a ^(l) =∑ _b≠a α _b f _b ^(l) + β _b Formula 2-3

The data with the data a ^(l) with the layer b ^(l) successive quantization precision, the

A known.

Optionally, the determining s and f according to the empirical value constant specifically includes:

set up

Where C is an integer constant;

set up

Where C is a rational constant;

among them,

Is the index of the discrete quantization precision of data a ^(l) ,

Is the continuous quantization accuracy of data a ^(l) .

In a second aspect, an artificial intelligence processor is provided, and the artificial intelligence processor includes:

The processing unit is used to determine the operational data of the neural network;

The obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;

The processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.

In a third aspect, a neural network computing device is provided, and the neural network computing device includes one or more chips provided in the second aspect.

In a fourth aspect, a combined processing device is provided. The combined processing device includes: the neural network computing device, the universal interconnection interface, and the universal processing device provided in the third aspect;

The neural network computing device is connected to the general processing device through the general interconnection interface.

In a fifth aspect, an electronic device is provided, which includes the chip provided in the second aspect or the neural network computing device provided in the third aspect.

In a sixth aspect, a computer-readable storage medium is provided, which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method provided in the first aspect.

In a seventh aspect, a computer program product is provided, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method provided in the first aspect.

In a first aspect, a method for quantifying computational data is provided. The method is applied to an artificial intelligence processor. The method includes the following steps:

A hybrid quantization method for computing data, which is applied to an artificial intelligence processor, and is characterized in that the method includes the following steps:

Determine the operation data; obtain the quantization command, the quantization command includes the data type of the quantization accuracy, and extract the quantization function corresponding to the data type from the calculation library according to the data type; divide the operation data into g groups of data according to the quantization accuracy The data type of uses a hybrid quantization operation on the g group of data to obtain quantized data, so that the artificial intelligence processor performs arithmetic operations according to the quantized data, where g is an integer greater than or equal to 2.

Output neuron derivative

Weight derivative

One or any combination of;

Optionally, the applying a hybrid quantization operation to the g group of data according to the data type of the quantization precision specifically includes:

According to the data type of the quantization accuracy, at least two data types of quantization accuracy are used to perform a quantization operation on the g group for the g group data, wherein the data types of the quantization accuracy of the single group of data in the g group data are the same.

Optionally, the dividing the operation data into g groups of data specifically includes:

Divide the calculation data into g groups of data according to the network layer of the neural network;

Or divide the calculation data into g groups of data according to the number of cores of the artificial intelligence processor.

Optionally, the using at least two data types of quantization precision for the g group of data to perform a quantization operation on the group g according to the data type of the quantization precision specifically includes:

A part of the g group of data is quantized with discrete quantization precision, and another part of the g group of data is quantized with continuous quantization precision.

Optionally, the extraction of the quantization function corresponding to the data type from the calculation library according to the data type specifically includes:

For example, the data type of the quantization accuracy is the exponent s of the discrete quantization accuracy, and the quantized data is calculated according to the discrete quantization accuracy and the element value of the operation data;

If the data type of the quantization precision is continuous quantization precision, the quantized data is calculated according to the continuous quantization precision and the element value of the operation data.

Optionally, before dividing the operation data into g groups of data, the method further includes determining the quantization accuracy of the operation data according to the operation data and the data type, and determining the operation data according to the operation data and the data type The quantification accuracy of data includes:

Determine s or f according to the maximum absolute value of the calculated data;

Or determine s or f according to the minimum absolute value of the calculated data;

Or determine s or f according to the quantization precision of different data;

Or determine s or f according to the empirical value constant.

Optionally, the determining s and f according to the maximum absolute value of the operation data specifically includes:

Determine s by formula 1-1;

s=[log ₂ (a _max )—bitnum+1] formula (1-1)

Determine f by formula 2-1

Optionally, the determining s or f according to the minimum absolute value of the operation data specifically includes:

Determine s by formula 1-2;

Or, determine the accuracy f by formula 2-2;

f=a _min *d formula 2-2

Optionally, the determining s and f according to the quantization precision of different data specifically includes:

Determine the discrete quantization accuracy by formula 1-3

S _a ^(l) =∑ _b≠a α _b S _b ^(l) + β _b Formula 1-3

among them,

Discrete quantization precision data and a ^(l) in the same layer as the data b ^(l) of the index, the

A known;

Determine the continuous quantization accuracy by formula 2-3

f _a ^(l) =∑ _b≠a α _b f _b ^(l) + β _b Formula 2-3

It is known that the superscript l represents the first layer.

set up

Where C is an integer constant;

set up

Where C is a rational constant;

among them,

Is the index of the discrete quantization precision of the data a ^(l) of the lth layer,

It is the continuous quantization accuracy of the data a ^(l) of the first layer.

Optionally, the method further includes:

Adjust s or f dynamically.

Optionally, the dynamic adjustment of s or f specifically includes:

Adjust s or f upward according to the maximum absolute value of the data to be quantified:;

Or gradually adjust s or f upward according to the maximum absolute value of the data to be quantified;

Or adjust s or f upwards in a single step according to the data distribution to be quantified;

Or gradually adjust s or f upwards according to the data distribution to be quantified;

Or adjust s or f downward according to the maximum absolute value of the data to be quantified.

Optionally, the method further includes:

Adjust the trigger frequency of s and f dynamically.

Optionally, the dynamic adjustment of the trigger frequency of s and f specifically includes:

Adjust s and f once every δ training periods, and δ is fixed;

Or adjust s and f once every δ training periods epoch, and δ is fixed;

Or adjust s and f every step iteration or epoch, step=αδ, where α is greater than 1;

Or adjust s and f once every δ training iterations or epochs, and the δ gradually decreases as the number of training increases.

Description of the drawings

Figure 1 is a schematic diagram of a neural network training method.

Fig. 2 is a schematic flowchart of a method for adjusting the quantization frequency of arithmetic data.

Figure 3a is a schematic representation of discrete fixed-point data.

Figure 3b is a schematic representation of continuous fixed-point data.

Figure 4a is a schematic diagram of a chip.

Figure 4b is a schematic diagram of another chip.

Figure 5a shows a schematic structural diagram of a combined processing device for this disclosure.

FIG. 5b also discloses another structural schematic diagram of a combined processing device in this disclosure.

Fig. 5c is a schematic structural diagram of a neural network processor board provided by an embodiment of the disclosure.

FIG. 5d is a schematic structural diagram of a neural network chip packaging structure provided by an embodiment of the disclosure.

Fig. 5e is a schematic structural diagram of a neural network chip provided by an embodiment of the disclosure.

FIG. 6 is a schematic diagram of a neural network chip packaging structure provided by an embodiment of the disclosure.

FIG. 6a is a schematic diagram of another neural network chip packaging structure provided by an embodiment of the disclosure.

detailed description

In order to enable those skilled in the art to better understand the solutions of the disclosure, the technical solutions in the embodiments of the disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the disclosure. Obviously, the described embodiments are only It is a part of the embodiments of this disclosure, but not all the embodiments. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this disclosure.

The quantization method of arithmetic data provided in this application can be run in a processor. The aforementioned processor can be a general-purpose processor, such as a central processing unit CPU, or a dedicated processor, such as a graphics processor GPU. Realized in artificial intelligence processor. This application does not limit the specific expression form of the processor.

Refer to Figure 1. Figure 1 is a training schematic diagram of a computing layer of a neural network provided by this application. As shown in Figure 1, the computing layer can be a fully connected layer or a convolutional layer. For example, the computing layer is a fully connected layer. , It corresponds to a fully connected operation, such as the matrix multiplication operation shown in Fig. 1. If the operation layer is a convolution layer, the corresponding operation is a convolution operation. As shown in Figure 1, the training includes forward inference (abbreviated as inference) and reverse training. As shown in Figure 1, the solid line shows the process of forward inference, and the dashed line shows the process of reverse training. As shown in the forward reasoning shown in Figure 1, the input data and weights of the computing layer are calculated to obtain the output data of the computing layer. The output data can be the input data of the next layer of the computing layer. In the reverse training process shown in Figure 1, the output data gradient and weight value of the arithmetic layer are calculated to obtain the input data gradient. The output data gradient and the input data are calculated to obtain the weight gradient. The weight gradient is used for the calculation of the calculation layer. The weight is updated, and the input data gradient is used as the output data gradient of the next layer of the operation layer.

Referring to Figure 2, Figure 2 provides a method for adjusting the quantization frequency of calculation data. The calculation may include forward inference and/or reverse training; the method is executed by a computing chip, which may be a general-purpose processor , Such as central processing unit, graphics processor, of course, it can also be a dedicated neural network processor. Of course, the above method can also be executed by a device including a computing chip. As shown in FIG. 2, the method includes the following steps:

Step S201: The calculation chip determines the calculation data of the neural network;

Optionally, the calculation data in the above step S201 includes but is not limited to: input neuron A, output neuron B, weight W, input neuron derivative

Output neuron derivative

Weight derivative

One or any combination of them.

Step S202: The computing chip obtains a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;

Optionally, the data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy f. The discrete quantization accuracy can be expressed as 2 ^s , where s is an index of the discrete quantization accuracy. Figure 3a is a schematic representation of discrete fixed-point data, and Figure 3b is a schematic representation of continuous fixed-point data.

Step S203: The computing chip obtains the training parameters of the neural network, and determines the adjustment frequency of the quantization accuracy according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.

Optionally, the foregoing training parameters include: training period iteration or training period epoch. The above-mentioned training period may specifically be that one training sample completes one iteration operation, that is, one training sample completes one forward operation and reverse training, that is, one training period; the above-mentioned training period may specifically include: all samples in the training set are trained for one training at a time Period, that is, a training period can perform a forward operation and reverse training for all samples in the training set, that is, a training period.

The technical solution provided by this application determines the type of quantization accuracy and quantization accuracy through a quantization command after determining the neural network operation data, then obtains training parameters, and determines the adjustment frequency of the quantization accuracy according to the training parameters, so that the artificial intelligence processor According to the adjustment frequency, the quantization accuracy of the neural network data is adjusted. It avoids the problem of mismatch between the quantization accuracy and the operation data caused by the unadjusted quantization accuracy for a long time, makes the quantization accuracy and the operation data match higher, improves the calculation accuracy or reduces the calculation amount.

Optionally, the foregoing determining the adjustment frequency of the quantization accuracy according to the training parameters and the data type of the quantization accuracy specifically includes:

a) Never trigger adjustment, that is, fix s and f.

b) The quantization accuracy is adjusted once every δ training cycles (iteration), δ is fixed, and the adjustment method can refer to the above adjustment method of s and f. The above-mentioned training period may specifically be that one training sample completes one iteration operation, that is, one training sample completes one forward operation and reverse training, that is, one training period.

Preferably, the above-mentioned δ can be set to different values according to the type of data. For example, for data types such as input and output neurons and weight data, δ can be set to 100. For the neuron derivative data type, δ can be set to 20.

c) Adjust the quantization accuracy every δ training epochs (period), and δ is fixed. The above-mentioned training period may specifically include: all samples in the training set are trained once as a training period, that is, a training period can perform one forward operation and reverse training for all samples in the training set, that is, a training period.

d) Adjust the quantization accuracy every δ training iterations or epochs, and adjust step=αδ every step iterations or epochs, where α is greater than 1.

e) Adjust the quantization accuracy every δ training iterations or epochs. As the number of training increases, δ gradually decreases. For example, when the number of training is 100, δ=100, when the number of training is 180, δ=90, and the number of training is At 260, δ=80.

Optionally, the foregoing method for determining the quantization accuracy may specifically be any one of the following methods:

Method A: Determine s and f according to the maximum absolute value of the calculated data;

Specifically, the calculation chip determines the maximum absolute value amax of the operation data, and determines s or f by the following formula;

S can be determined by formula 1-1;

S _a =[log ₂ (a _max )—bitnum+1] formula (1-1)

F can be determined by formula 2-1

Where c is a constant and can be any rational number. Preferably, the c can be any rational number between [1,1.2]; of course, the above c can also be a rational number in the above range, and only needs to be a rational number. The above-mentioned bitnum is the number of bits of the quantized data; refer to Figure 3a and Figure 3b, Figure 3a is a schematic diagram of discrete fixed-point data, and Figure 3b is a schematic diagram of continuous fixed-point data. For discrete fixed-point data, the bitnum can be 8, 16, 24 or 32.

Optionally, multiple methods can be used for the above-mentioned amax selection method. Specifically, amax can be searched by data category; it can also be searched by layers, categories or groups.

A1. The computing chip can use all layer classifications to find the maximum absolute value; the computing chip determines that each element of the data to be calculated is

among them

Can be input neuron A, output neuron B, weight W, input neuron derivative

Output neuron derivative

Weight derivative

The element value of the l-th level. Traverse all the layers of the neural network to find the maximum absolute value of each category of data in all layers.

A2. The computing chip can use hierarchical classification to find the maximum absolute value; the computing chip determines each element of the data to be calculated as

among them

Can be input neuron A, output neuron B, weight W, input neuron derivative

Output neuron derivative

Weight derivative

The element value of the l-th level. Of course, in practical applications, it is also possible to extract data of other layers except for layer 1, for example, the maximum absolute value of each category data of the λ layer can be extracted, and λ is an integer greater than or equal to 2.

A3. The computing chip can use hierarchical classification and grouping to find the maximum absolute value; the computing chip determines that each element of the data to be calculated is

among them

Can be input neuron A, output neuron B, weight W, input neuron derivative

Output neuron derivative

Weight derivative

The element value of the l-th level. Divide each type of data in each layer into g groups (the g can be an empirical value or a user set value), traverse all layers of the neural network, and find the maximum absolute value of each group of g data in each category of each layer value. The above input neuron A, output neuron B, weight W, input neuron derivative

Output neuron derivative

Weight derivative

Both can represent one type of data.

Method B: Determine s and f according to the minimum absolute value of the calculated data;

Specifically, the computing chip determines the minimum absolute value a _{min of} the arithmetic data, and determines the fixed-point accuracy s or f according to a _min .

The accuracy s can be determined by formula 1-2;

The accuracy f can be determined by formula 2-2;

f _a ＝a _min *d Formula 2-2

Where d is a constant and can be any rational number.

The a _min mentioned above is searched by data category; it can also be searched in layers, categories or groups. The specific search method can be the above-mentioned A1, A2 or A3 method, just replace a _max in A1, A2 or A3 with a _min .

Method C. Determine s and f according to the relationship between different data types:

The fixed-point precision s of different data types in the same layer is correlated. For example, the first layer data a ^(l)

Can be determined by the l-th layer data b ^(l)

Determine according to formula 1-3.

S _a ^(l) =∑ _b≠a α _b S _b ^(l) + β _b Formula 1-3

Fixed-point accuracy of the l layer data a ^(l)

Can be determined by the fixed-point accuracy of the first layer data b ^(l)

Determine according to formula 2-3.

f _a ^(l) =∑ _b≠a α _b f _b ^(l) + β _b Formula 2-3

Among them, α _b and β _b are integer constants. Specifically, for Formula 1-3, the α _b and β _b are integer constants, and for Formula 2-3, the α _b and β _b are rational constants.

The above data type a ^(l) can be input neuron X ^(l) , output neuron Y ^(l) , weight W ^(l) , input neuron derivative

Output neuron derivative

Weight derivative

One of the above data type b ^(l) can be input neuron X ^(l) , output neuron Y ^(l) , weight W ^(l) , input neuron derivative

Output neuron derivative

Weight derivative

In another.

Method D. The calculation chip determines s and f according to the empirical value constant:

Specifically, the fixed-point accuracy of the first layer data type a ^(l)

Can be set manually

Where C is an integer constant.

Specifically, the fixed-point accuracy of the first layer data type a ^(l)

Can be set manually

Where C is a rational constant.

This application also provides an artificial intelligence processor, which includes:

The present disclosure also discloses a neural network computing device, which includes one or more chips as shown in FIG. 4a or FIG. 4b, and may also include one or more artificial intelligence processors. It is used to obtain the data to be calculated and control information from other processing devices, execute the specified neural network operation, and transfer the execution result to the peripheral device through the I/O interface. Peripheral devices such as camera, monitor, mouse, keyboard, network card, wifi interface, server. When it contains more than one chip as shown in Figure 4a or Figure 4b, the chips as shown in Figure 4a or Figure 4b can be linked and transmit data through a specific structure, for example, interconnect and transmit data through the PCIE bus Data to support larger-scale neural network operations. At this time, the same control system can be shared, or there can be separate control systems; the memory can be shared, or each accelerator can have its own memory. In addition, the interconnection mode can be any interconnection topology.

The neural network computing device has high compatibility and can be connected to various types of servers through a PCIE interface.

The present disclosure also discloses a combined processing device, which includes the above-mentioned neural network computing device, a universal interconnection interface, and other processing devices (ie, universal processing devices). The neural network computing device interacts with other processing devices to jointly complete the operation specified by the user. For example, 5a is a schematic diagram of the combined processing device.

Other processing devices include one or more types of general/special processors such as central processing unit CPU, graphics processing unit GPU, neural network processor, etc. The number of processors included in other processing devices is not limited. Other processing devices serve as the interface between the neural network computing device and external data and control, including data handling, and complete basic controls such as opening and stopping the neural network computing device; other processing devices can also cooperate with the neural network computing device to complete computing tasks.

The universal interconnection interface is used to transmit data and control instructions between the neural network computing device and other processing devices. The neural network computing device obtains the required input data from other processing devices and writes them to the storage device on the neural network computing device chip; it can obtain control instructions from other processing devices and write them to the control buffer on the neural network computing device chip; also The data in the storage module of the neural network computing device can be read and transmitted to other processing devices.

As shown in Figure 5b, optionally, the structure also includes a storage device for storing data required by the arithmetic unit/arithmetic device or other arithmetic units, especially suitable for the data that needs to be calculated in the neural network arithmetic device Or other data that cannot be stored in the internal storage of the processing device.

The combined processing device can be used as an SOC system-on-chip for mobile phones, robots, drones, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption. In this case, the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.

Please refer to FIG. 5c. FIG. 5c is a schematic structural diagram of a neural network processor board provided by an embodiment of the disclosure. As shown in FIG. 5c, the aforementioned neural network processor board 10 includes a neural network chip packaging structure 11, a first electrical and non-electrical connection device 12, and a first substrate 13.

This disclosure does not limit the specific structure of the neural network chip packaging structure 11. Optionally, as shown in FIG. 5d, the neural network chip packaging structure 11 includes: a neural network chip 111, a second electrical and non-electrical connection device 112, and a second electrical connection device 112. Two substrate 113.

The specific form of the neural network chip 111 involved in this disclosure is not limited. The neural network chip 111 mentioned above includes but is not limited to a neural network chip integrated with a neural network processor. The chip can be made of silicon material, germanium material, quantum material or molecular Made of materials. According to the actual situation (for example: harsher environment) and different application requirements, the above-mentioned neural network chip can be packaged so that most of the neural network chip is wrapped, and the pins on the neural network chip are passed through the gold wire The conductor is connected to the outer side of the package structure for circuit connection with the outer layer.

The present disclosure does not limit the specific structure of the neural network chip 111. For alternatives, please refer to the device shown in FIG. 4a or FIG. 4b.

This disclosure does not limit the types of the first substrate 13 and the second substrate 113. They may be printed circuit boards (PCB) or printed wiring boards (PWB), or other circuit boards. . There are no restrictions on the materials used to make the PCB.

The second substrate 113 involved in the present disclosure is used to carry the aforementioned neural network chip 111, and the neural network chip package structure 11 obtained by connecting the aforementioned neural network chip 111 and the second substrate 113 through a second electrical and non-electrical connection device 112 , Used to protect the neural network chip 111 and facilitate further packaging of the neural network chip packaging structure 11 and the first substrate 13.

The packaging method and the corresponding structure of the packaging method of the above-mentioned specific second electrical and non-electrical connection device 112 are not limited. A suitable packaging method can be selected according to the actual situation and different application requirements and simple improvements, such as flip chip Ball grid array package (Flip Chip Ball Grid Array Package, FCBGAP), Low-profile Quad Flat Package (LQFP), Quad Flat Package with Heat sink (HQFP), no Pin Quad Flat Non-lead Package (QFN) or Fine-Pitch Ball Grid Package (FBGA) and other packaging methods.

Flip chip (Flip Chip) is suitable for situations that require high packaging area or are sensitive to wire inductance and signal transmission time. In addition, wire bonding (Wire Bonding) packaging methods can be used to reduce costs and improve the flexibility of the packaging structure.

Ball Grid Array (Ball Grid Array) can provide more pins, and the average lead length of the pins is short, which has the function of high-speed signal transmission. Among them, the package can be a pin grid array (Pin Grid Array, PGA) , Zero Insertion Force (ZIF), Single Edge Contact Connection (SECC), Land Grid Array (LGA), etc. instead.

Optionally, a Flip Chip Ball Grid Array (Flip Chip Ball Grid Array) packaging method is used to package the neural network chip 111 and the second substrate 113. For a specific schematic diagram of the neural network chip packaging structure, refer to FIG. 6. As shown in FIG. 6, the above-mentioned neural network chip packaging structure includes: a neural network chip 21, a pad 22, a solder ball 23, a second substrate 24, a connection point 25 on the second substrate 24, and a pin 26.

Wherein, the pad 22 is connected to the neural network chip 21, and a solder ball 23 is formed by welding between the pad 22 and the connection point 25 on the second substrate 24, and the neural network chip 21 and the second substrate 24 are connected, that is, the realization The packaging of the neural network chip 21.

The pin 26 is used to connect with the external circuit of the package structure (for example, the first substrate 13 on the neural network processor board 10), which can realize the transmission of external data and internal data, and is convenient for the neural network chip 21 or the neural network chip 21 The corresponding neural network processor processes the data. The disclosure also does not limit the type and number of pins, and different pin forms can be selected according to different packaging technologies, and they are arranged in accordance with certain rules.

Optionally, the above-mentioned neural network chip packaging structure further includes an insulating filler, which is placed in the gap between the pad 22, the solder ball 23 and the connection point 25 to prevent interference between the solder ball and the solder ball.

Among them, the material of the insulating filler may be silicon nitride, silicon oxide or silicon oxynitride; interference includes electromagnetic interference, inductive interference, etc.

Optionally, the aforementioned neural network chip packaging structure further includes a heat dissipation device for dissipating heat when the neural network chip 21 is running. The heat dissipation device may be a metal sheet, heat sink or heat sink with good thermal conductivity, for example, a fan.

For example, as shown in FIG. 6a, the neural network chip packaging structure 11 includes: a neural network chip 21, pads 22, solder balls 23, a second substrate 24, connection points 25 on the second substrate 24, pins 26, Insulating filler 27, heat dissipation paste 28 and metal shell heat sink 29. Among them, the heat dissipation paste 28 and the metal shell heat sink 29 are used to dissipate the heat of the neural network chip 21 during operation.

Optionally, the aforementioned neural network chip packaging structure 11 further includes a reinforcing structure connected to the pad 22 and buried in the solder ball 23 to enhance the connection strength between the solder ball 23 and the pad 22.

Among them, the reinforcing structure may be a metal wire structure or a columnar structure, which is not limited herein.

The present disclosure does not limit the specific form of the first electrical and non-electrical device 12, and can refer to the description of the second electrical and non-electrical device 112, that is, the neural network chip packaging structure 11 is packaged by welding, or the connection can be used. The method of connecting the second substrate 113 and the first substrate 13 by wire connection or plug-in connection facilitates the subsequent replacement of the first substrate 13 or the neural network chip packaging structure 11.

Optionally, the first substrate 13 includes an interface for a memory unit for expanding storage capacity, such as: Synchronous Dynamic Random Access Memory (SDRAM), Double Date Rate SDRAM, DDR), etc., by expanding the memory to improve the processing capacity of the neural network processor.

The first substrate 13 may also include a Peripheral Component Interconnect-Express (PCI-E or PCIe) interface, a small form-factor pluggable (SFP) interface, an Ethernet interface, Controller Area Network (CAN) interfaces, etc., are used for data transmission between the package structure and external circuits, which can improve computing speed and operational convenience.

The neural network processor is packaged as a neural network chip 111, the neural network chip 111 is packaged as a neural network chip packaging structure 11, and the neural network chip packaging structure 11 is packaged as a neural network processor board card 10 through the interface on the board ( The socket or ferrule) performs data interaction with an external circuit (for example, a computer motherboard), that is, directly realizes the function of the neural network processor by using the neural network processor board 10 and protects the neural network chip 111. In addition, other modules can be added to the neural network processor board 10, which improves the application range and computing efficiency of the neural network processor.

In one embodiment, the present disclosure discloses an electronic device, which includes the neural network processor board 10 or the neural network chip packaging structure 11 described above.

Electronic devices include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cameras, cameras, projectors, watches, headsets, mobile storage , Wearable devices, vehicles, household appliances, and/or medical equipment.

The transportation means include airplanes, ships, and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.

The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of this disclosure in further detail. It should be understood that the above are only specific embodiments of the disclosure and are not intended to limit the disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this disclosure shall be included in the protection scope of this disclosure.

Claims

A method for adjusting the quantization frequency of computing data, characterized in that the method is applied to an artificial intelligence processor, and the method includes the following steps:

Determine the calculation data of the neural network;

Acquire a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;

The training parameters of the neural network are acquired, and the adjustment frequency of the quantization accuracy is determined according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
The method according to claim 1, wherein:

The calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative
Output neuron derivative
Weight derivative
One or any combination of;

The data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.
The method of claim 1 or 2, wherein:

The training parameters include: training period iteration or training period epoch.
The method according to claim 3, wherein the determining the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy specifically comprises:

Adjust the quantization accuracy every δ training cycles, and δ is fixed;

Or adjust the quantization accuracy every δ training period epoch, δ is fixed;

Or adjust the quantization accuracy every step iteration or epoch, where step=αδ, α is greater than 1;

Or adjust the quantization accuracy every δ training iterations or epochs, and the δ gradually decreases as the number of training increases.
The method according to claim 4, wherein the method for determining the quantization accuracy specifically comprises:

Determine the exponent s of the discrete quantization accuracy or the continuous quantization accuracy f according to the maximum absolute value of the operation data;

Or determine s and f according to the minimum absolute value of the calculated data;

Or determine s and f according to the quantitative precision relationship of different data;

Or determine s and f based on empirical constants.
The method according to claim 5, wherein the determining s or f according to the maximum absolute value of the operation data specifically comprises:

For discrete quantization accuracy, determine s by formula 1-1;

S a =[log 2 (a max )—bitnum+1] formula (1-1)

For continuous quantization accuracy, determine f by formula 2-1

Where c is a constant, bitnum is the number of bits of the quantized data, and amax is the maximum absolute value of the calculated data.
The method according to claim 5, wherein the determining s and f according to the minimum absolute value of the calculated data specifically comprises:

If it is discrete quantization accuracy, determine the accuracy s by formula 1-2;

If it is continuous quantization accuracy, determine the accuracy f by formula 2-2;

f a ＝a min *d Formula 2-2

Among them, d is a constant and amin is the absolute minimum value of the calculated data.
The method according to claim 6 or 7, wherein the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:

Use all layer classifications to find the absolute maximum value or the absolute minimum value;

Or use hierarchical classification to find the absolute maximum value or the absolute minimum value;

Or use hierarchical classification and grouping to find the absolute maximum value or the absolute minimum value.
The method according to claim 5, wherein the determining s and f according to the quantization accuracy of different data types specifically comprises:

For discrete quantization accuracy, use formula 1-3 to determine the discrete fixed-point accuracy

S a (l) =∑ b≠a α b S b (l) + β b Formula 1-3

among them,
Discrete fixed-point accuracy of the data a (l) in the same layer as the data b (l) of the
A known;

For continuous quantization accuracy, use formula 2-3 to determine the continuous quantization accuracy

f a (l) = Σ b≠a α b f b (l) + β b Formula 2-3

The data with the data a (l) with the layer b (l) successive quantization precision, the
A known.
The method of claim 5, wherein the setting is determined according to the empirical value constant
Where C is an integer constant;

set up
Where C is a rational constant;

among them,
Is the index of the discrete quantization precision of data a (l) ,
Is the continuous quantization accuracy of data a (l) .
An artificial intelligence processor, characterized in that the artificial intelligence processor comprises:

The processing unit is used to determine the operational data of the neural network;

The obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;

The processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
A neural network computing device, wherein the neural network computing device includes one or more artificial intelligence processors according to claim 11.
A combined processing device, characterized in that the combined processing device comprises: a neural network computing device, a universal interconnection interface, and a universal processing device according to claim 12;

The neural network computing device is connected to the general processing device through the general interconnection interface.
An electronic device, characterized in that the electronic device comprises the artificial intelligence processor according to claim 11 or the neural network computing device according to claim 12.
A computer-readable storage medium, characterized by storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute the method according to any one of claims 1-10.
A computer program product, wherein the computer program product includes a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to make a computer execute the method according to any one of claims 1-10.