WO2020211783A1 - Procédé d'ajustement de fréquence de quantification de données opérationnelles, et produit associé - Google Patents

Procédé d'ajustement de fréquence de quantification de données opérationnelles, et produit associé Download PDF

Info

Publication number
WO2020211783A1
WO2020211783A1 PCT/CN2020/084943 CN2020084943W WO2020211783A1 WO 2020211783 A1 WO2020211783 A1 WO 2020211783A1 CN 2020084943 W CN2020084943 W CN 2020084943W WO 2020211783 A1 WO2020211783 A1 WO 2020211783A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
quantization accuracy
quantization
neural network
accuracy
Prior art date
Application number
PCT/CN2020/084943
Other languages
English (en)
Chinese (zh)
Inventor
刘少礼
张曦珊
曾洪博
孟小甫
Original Assignee
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910307675.3A external-priority patent/CN111832696A/zh
Priority claimed from CN201910306480.7A external-priority patent/CN111832712A/zh
Priority claimed from CN201910306479.4A external-priority patent/CN111832711A/zh
Priority claimed from CN201910306477.5A external-priority patent/CN111832709A/zh
Priority claimed from CN201910307672.XA external-priority patent/CN111832695A/zh
Priority claimed from CN201910306478.XA external-priority patent/CN111832710A/zh
Application filed by 上海寒武纪信息科技有限公司 filed Critical 上海寒武纪信息科技有限公司
Publication of WO2020211783A1 publication Critical patent/WO2020211783A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of neural networks, and in particular to a method for quantifying frequency adjustment of arithmetic data and related products.
  • ANN Artificial Neural Network
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the calculation data will be quantized, but the existing quantization processing will not change the frequency of quantization, so that the quantization parameters may not match the calculation data, resulting in the effect of quantization, affecting the accuracy or calculation of the calculation the amount.
  • the embodiment of the disclosure provides a method for adjusting the quantization frequency of arithmetic data and related products, which can improve the accuracy of calculation or reduce the amount of calculation.
  • a method for adjusting the quantization frequency of arithmetic data includes the following steps:
  • a method for adjusting the quantization frequency of arithmetic data is characterized in that the method is applied to an artificial intelligence processor, and the method includes the following steps:
  • the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;
  • the training parameters of the neural network are acquired, and the adjustment frequency of the quantization accuracy is determined according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
  • the calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative One or any combination of;
  • the data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.
  • the training parameters include: training period iteration or training period epoch.
  • the determining the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy specifically includes:
  • the method for determining the quantization accuracy specifically includes:
  • the determining s or f according to the maximum absolute value of the operation data specifically includes:
  • bitnum is the number of bits of the quantized data
  • amax is the maximum absolute value of the calculated data
  • the determining s and f according to the minimum absolute value of the operation data specifically includes:
  • d is a constant and amin is the absolute minimum value of the calculated data.
  • the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:
  • the determining s and f according to the quantization precision of different data types specifically includes:
  • the determining s and f according to the empirical value constant specifically includes:
  • an artificial intelligence processor includes:
  • the processing unit is used to determine the operational data of the neural network
  • the obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;
  • the processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
  • a neural network computing device in a third aspect, includes one or more chips provided in the second aspect.
  • a combined processing device includes: the neural network computing device, the universal interconnection interface, and the universal processing device provided in the third aspect;
  • the neural network computing device is connected to the general processing device through the general interconnection interface.
  • an electronic device which includes the chip provided in the second aspect or the neural network computing device provided in the third aspect.
  • a computer-readable storage medium which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method provided in the first aspect.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method provided in the first aspect.
  • a method for quantifying computational data is provided.
  • the method is applied to an artificial intelligence processor.
  • the method includes the following steps:
  • a hybrid quantization method for computing data which is applied to an artificial intelligence processor, and is characterized in that the method includes the following steps:
  • the data type of uses a hybrid quantization operation on the g group of data to obtain quantized data, so that the artificial intelligence processor performs arithmetic operations according to the quantized data, where g is an integer greater than or equal to 2.
  • the calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative One or any combination of;
  • the data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.
  • the applying a hybrid quantization operation to the g group of data according to the data type of the quantization precision specifically includes:
  • At least two data types of quantization accuracy are used to perform a quantization operation on the g group for the g group data, wherein the data types of the quantization accuracy of the single group of data in the g group data are the same.
  • the dividing the operation data into g groups of data specifically includes:
  • the using at least two data types of quantization precision for the g group of data to perform a quantization operation on the group g according to the data type of the quantization precision specifically includes:
  • a part of the g group of data is quantized with discrete quantization precision, and another part of the g group of data is quantized with continuous quantization precision.
  • the extraction of the quantization function corresponding to the data type from the calculation library according to the data type specifically includes:
  • the data type of the quantization accuracy is the exponent s of the discrete quantization accuracy, and the quantized data is calculated according to the discrete quantization accuracy and the element value of the operation data;
  • the quantized data is calculated according to the continuous quantization precision and the element value of the operation data.
  • the method before dividing the operation data into g groups of data, the method further includes determining the quantization accuracy of the operation data according to the operation data and the data type, and determining the operation data according to the operation data and the data type
  • the quantification accuracy of data includes:
  • the determining s and f according to the maximum absolute value of the operation data specifically includes:
  • bitnum is the number of bits of the quantized data
  • amax is the maximum absolute value of the calculated data
  • the determining s or f according to the minimum absolute value of the operation data specifically includes:
  • d is a constant and amin is the absolute minimum value of the calculated data.
  • the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:
  • the determining s and f according to the quantization precision of different data specifically includes:
  • the determining s and f according to the empirical value constant specifically includes:
  • the method further includes:
  • the dynamic adjustment of s or f specifically includes:
  • the method further includes:
  • the dynamic adjustment of the trigger frequency of s and f specifically includes:
  • step ⁇ , where ⁇ is greater than 1;
  • Figure 1 is a schematic diagram of a neural network training method.
  • Fig. 2 is a schematic flowchart of a method for adjusting the quantization frequency of arithmetic data.
  • Figure 3a is a schematic representation of discrete fixed-point data.
  • Figure 3b is a schematic representation of continuous fixed-point data.
  • Figure 4a is a schematic diagram of a chip.
  • Figure 4b is a schematic diagram of another chip.
  • Figure 5a shows a schematic structural diagram of a combined processing device for this disclosure.
  • FIG. 5b also discloses another structural schematic diagram of a combined processing device in this disclosure.
  • Fig. 5c is a schematic structural diagram of a neural network processor board provided by an embodiment of the disclosure.
  • FIG. 5d is a schematic structural diagram of a neural network chip packaging structure provided by an embodiment of the disclosure.
  • Fig. 5e is a schematic structural diagram of a neural network chip provided by an embodiment of the disclosure.
  • FIG. 6 is a schematic diagram of a neural network chip packaging structure provided by an embodiment of the disclosure.
  • FIG. 6a is a schematic diagram of another neural network chip packaging structure provided by an embodiment of the disclosure.
  • the quantization method of arithmetic data provided in this application can be run in a processor.
  • the aforementioned processor can be a general-purpose processor, such as a central processing unit CPU, or a dedicated processor, such as a graphics processor GPU. Realized in artificial intelligence processor. This application does not limit the specific expression form of the processor.
  • Figure 1 is a training schematic diagram of a computing layer of a neural network provided by this application.
  • the computing layer can be a fully connected layer or a convolutional layer.
  • the computing layer is a fully connected layer.
  • It corresponds to a fully connected operation, such as the matrix multiplication operation shown in Fig. 1.
  • the operation layer is a convolution layer, the corresponding operation is a convolution operation.
  • the training includes forward inference (abbreviated as inference) and reverse training.
  • the solid line shows the process of forward inference
  • the dashed line shows the process of reverse training.
  • the input data and weights of the computing layer are calculated to obtain the output data of the computing layer.
  • the output data can be the input data of the next layer of the computing layer.
  • the output data gradient and weight value of the arithmetic layer are calculated to obtain the input data gradient.
  • the output data gradient and the input data are calculated to obtain the weight gradient.
  • the weight gradient is used for the calculation of the calculation layer.
  • the weight is updated, and the input data gradient is used as the output data gradient of the next layer of the operation layer.
  • Figure 2 provides a method for adjusting the quantization frequency of calculation data.
  • the calculation may include forward inference and/or reverse training; the method is executed by a computing chip, which may be a general-purpose processor , Such as central processing unit, graphics processor, of course, it can also be a dedicated neural network processor.
  • a computing chip which may be a general-purpose processor , Such as central processing unit, graphics processor, of course, it can also be a dedicated neural network processor.
  • the above method can also be executed by a device including a computing chip.
  • the method includes the following steps:
  • Step S201 The calculation chip determines the calculation data of the neural network
  • the calculation data in the above step S201 includes but is not limited to: input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative One or any combination of them.
  • Step S202 The computing chip obtains a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;
  • the data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy f.
  • the discrete quantization accuracy can be expressed as 2 s , where s is an index of the discrete quantization accuracy.
  • Figure 3a is a schematic representation of discrete fixed-point data
  • Figure 3b is a schematic representation of continuous fixed-point data.
  • Step S203 The computing chip obtains the training parameters of the neural network, and determines the adjustment frequency of the quantization accuracy according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
  • the foregoing training parameters include: training period iteration or training period epoch.
  • the above-mentioned training period may specifically be that one training sample completes one iteration operation, that is, one training sample completes one forward operation and reverse training, that is, one training period; the above-mentioned training period may specifically include: all samples in the training set are trained for one training at a time Period, that is, a training period can perform a forward operation and reverse training for all samples in the training set, that is, a training period.
  • the technical solution provided by this application determines the type of quantization accuracy and quantization accuracy through a quantization command after determining the neural network operation data, then obtains training parameters, and determines the adjustment frequency of the quantization accuracy according to the training parameters, so that the artificial intelligence processor According to the adjustment frequency, the quantization accuracy of the neural network data is adjusted. It avoids the problem of mismatch between the quantization accuracy and the operation data caused by the unadjusted quantization accuracy for a long time, makes the quantization accuracy and the operation data match higher, improves the calculation accuracy or reduces the calculation amount.
  • the foregoing determining the adjustment frequency of the quantization accuracy according to the training parameters and the data type of the quantization accuracy specifically includes:
  • the quantization accuracy is adjusted once every ⁇ training cycles (iteration), ⁇ is fixed, and the adjustment method can refer to the above adjustment method of s and f.
  • the above-mentioned training period may specifically be that one training sample completes one iteration operation, that is, one training sample completes one forward operation and reverse training, that is, one training period.
  • the above-mentioned ⁇ can be set to different values according to the type of data.
  • can be set to 100.
  • can be set to 20.
  • the above-mentioned training period may specifically include: all samples in the training set are trained once as a training period, that is, a training period can perform one forward operation and reverse training for all samples in the training set, that is, a training period.
  • the foregoing method for determining the quantization accuracy may specifically be any one of the following methods:
  • Method A Determine s and f according to the maximum absolute value of the calculated data
  • the calculation chip determines the maximum absolute value amax of the operation data, and determines s or f by the following formula;
  • c is a constant and can be any rational number.
  • the c can be any rational number between [1,1.2]; of course, the above c can also be a rational number in the above range, and only needs to be a rational number.
  • the above-mentioned bitnum is the number of bits of the quantized data; refer to Figure 3a and Figure 3b, Figure 3a is a schematic diagram of discrete fixed-point data, and Figure 3b is a schematic diagram of continuous fixed-point data. For discrete fixed-point data, the bitnum can be 8, 16, 24 or 32.
  • amax can be searched by data category; it can also be searched by layers, categories or groups.
  • the computing chip can use all layer classifications to find the maximum absolute value; the computing chip determines that each element of the data to be calculated is among them Can be input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative The element value of the l-th level. Traverse all the layers of the neural network to find the maximum absolute value of each category of data in all layers.
  • the computing chip can use hierarchical classification to find the maximum absolute value; the computing chip determines each element of the data to be calculated as among them Can be input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative The element value of the l-th level.
  • the maximum absolute value of each category data of the ⁇ layer can be extracted, and ⁇ is an integer greater than or equal to 2.
  • the computing chip can use hierarchical classification and grouping to find the maximum absolute value; the computing chip determines that each element of the data to be calculated is among them Can be input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative
  • the above input neuron A, output neuron B, weight W, input neuron derivative Output neuron derivative Weight derivative Both can represent one type of data.
  • Method B Determine s and f according to the minimum absolute value of the calculated data
  • the computing chip determines the minimum absolute value a min of the arithmetic data, and determines the fixed-point accuracy s or f according to a min .
  • the accuracy s can be determined by formula 1-2;
  • the accuracy f can be determined by formula 2-2;
  • d is a constant and can be any rational number.
  • the a min mentioned above is searched by data category; it can also be searched in layers, categories or groups.
  • the specific search method can be the above-mentioned A1, A2 or A3 method, just replace a max in A1, A2 or A3 with a min .
  • the first layer data a (l) Can be determined by the l-th layer data b (l) Determine according to formula 1-3.
  • ⁇ b and ⁇ b are integer constants. Specifically, for Formula 1-3, the ⁇ b and ⁇ b are integer constants, and for Formula 2-3, the ⁇ b and ⁇ b are rational constants.
  • the above data type a (l) can be input neuron X (l) , output neuron Y (l) , weight W (l) , input neuron derivative Output neuron derivative Weight derivative
  • One of the above data type b (l) can be input neuron X (l) , output neuron Y (l) , weight W (l) , input neuron derivative Output neuron derivative Weight derivative In another.
  • Method D The calculation chip determines s and f according to the empirical value constant:
  • the fixed-point accuracy of the first layer data type a (l) can be set manually Where C is an integer constant.
  • the fixed-point accuracy of the first layer data type a (l) can be set manually Where C is a rational constant.
  • This application also provides an artificial intelligence processor, which includes:
  • the processing unit is used to determine the operational data of the neural network
  • the obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;
  • the processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
  • the present disclosure also discloses a neural network computing device, which includes one or more chips as shown in FIG. 4a or FIG. 4b, and may also include one or more artificial intelligence processors. It is used to obtain the data to be calculated and control information from other processing devices, execute the specified neural network operation, and transfer the execution result to the peripheral device through the I/O interface. Peripheral devices such as camera, monitor, mouse, keyboard, network card, wifi interface, server. When it contains more than one chip as shown in Figure 4a or Figure 4b, the chips as shown in Figure 4a or Figure 4b can be linked and transmit data through a specific structure, for example, interconnect and transmit data through the PCIE bus Data to support larger-scale neural network operations.
  • the same control system can be shared, or there can be separate control systems; the memory can be shared, or each accelerator can have its own memory.
  • the interconnection mode can be any interconnection topology.
  • the neural network computing device has high compatibility and can be connected to various types of servers through a PCIE interface.
  • the present disclosure also discloses a combined processing device, which includes the above-mentioned neural network computing device, a universal interconnection interface, and other processing devices (ie, universal processing devices).
  • the neural network computing device interacts with other processing devices to jointly complete the operation specified by the user.
  • 5a is a schematic diagram of the combined processing device.
  • Other processing devices include one or more types of general/special processors such as central processing unit CPU, graphics processing unit GPU, neural network processor, etc.
  • the number of processors included in other processing devices is not limited.
  • Other processing devices serve as the interface between the neural network computing device and external data and control, including data handling, and complete basic controls such as opening and stopping the neural network computing device; other processing devices can also cooperate with the neural network computing device to complete computing tasks.
  • the universal interconnection interface is used to transmit data and control instructions between the neural network computing device and other processing devices.
  • the neural network computing device obtains the required input data from other processing devices and writes them to the storage device on the neural network computing device chip; it can obtain control instructions from other processing devices and write them to the control buffer on the neural network computing device chip; also The data in the storage module of the neural network computing device can be read and transmitted to other processing devices.
  • the structure also includes a storage device for storing data required by the arithmetic unit/arithmetic device or other arithmetic units, especially suitable for the data that needs to be calculated in the neural network arithmetic device Or other data that cannot be stored in the internal storage of the processing device.
  • a storage device for storing data required by the arithmetic unit/arithmetic device or other arithmetic units, especially suitable for the data that needs to be calculated in the neural network arithmetic device Or other data that cannot be stored in the internal storage of the processing device.
  • the combined processing device can be used as an SOC system-on-chip for mobile phones, robots, drones, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption.
  • the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
  • FIG. 5c is a schematic structural diagram of a neural network processor board provided by an embodiment of the disclosure.
  • the aforementioned neural network processor board 10 includes a neural network chip packaging structure 11, a first electrical and non-electrical connection device 12, and a first substrate 13.
  • the neural network chip packaging structure 11 includes: a neural network chip 111, a second electrical and non-electrical connection device 112, and a second electrical connection device 112. Two substrate 113.
  • the specific form of the neural network chip 111 involved in this disclosure is not limited.
  • the neural network chip 111 mentioned above includes but is not limited to a neural network chip integrated with a neural network processor.
  • the chip can be made of silicon material, germanium material, quantum material or molecular Made of materials. According to the actual situation (for example: harsher environment) and different application requirements, the above-mentioned neural network chip can be packaged so that most of the neural network chip is wrapped, and the pins on the neural network chip are passed through the gold wire
  • the conductor is connected to the outer side of the package structure for circuit connection with the outer layer.
  • the present disclosure does not limit the specific structure of the neural network chip 111.
  • first substrate 13 and the second substrate 113 may be printed circuit boards (PCB) or printed wiring boards (PWB), or other circuit boards. . There are no restrictions on the materials used to make the PCB.
  • the second substrate 113 involved in the present disclosure is used to carry the aforementioned neural network chip 111, and the neural network chip package structure 11 obtained by connecting the aforementioned neural network chip 111 and the second substrate 113 through a second electrical and non-electrical connection device 112 , Used to protect the neural network chip 111 and facilitate further packaging of the neural network chip packaging structure 11 and the first substrate 13.
  • the packaging method and the corresponding structure of the packaging method of the above-mentioned specific second electrical and non-electrical connection device 112 are not limited.
  • a suitable packaging method can be selected according to the actual situation and different application requirements and simple improvements, such as flip chip Ball grid array package (Flip Chip Ball Grid Array Package, FCBGAP), Low-profile Quad Flat Package (LQFP), Quad Flat Package with Heat sink (HQFP), no Pin Quad Flat Non-lead Package (QFN) or Fine-Pitch Ball Grid Package (FBGA) and other packaging methods.
  • FCBGAP flip chip Ball grid array package
  • LQFP Low-profile Quad Flat Package
  • HQFP Quad Flat Package with Heat sink
  • QFN Quad Flat Non-lead Package
  • FBGA Fine-Pitch Ball Grid Package
  • Flip chip (Flip Chip) is suitable for situations that require high packaging area or are sensitive to wire inductance and signal transmission time.
  • wire bonding (Wire Bonding) packaging methods can be used to reduce costs and improve the flexibility of the packaging structure.
  • Ball Grid Array can provide more pins, and the average lead length of the pins is short, which has the function of high-speed signal transmission.
  • the package can be a pin grid array (Pin Grid Array, PGA) , Zero Insertion Force (ZIF), Single Edge Contact Connection (SECC), Land Grid Array (LGA), etc. instead.
  • a Flip Chip Ball Grid Array (Flip Chip Ball Grid Array) packaging method is used to package the neural network chip 111 and the second substrate 113.
  • the above-mentioned neural network chip packaging structure includes: a neural network chip 21, a pad 22, a solder ball 23, a second substrate 24, a connection point 25 on the second substrate 24, and a pin 26.
  • the pad 22 is connected to the neural network chip 21, and a solder ball 23 is formed by welding between the pad 22 and the connection point 25 on the second substrate 24, and the neural network chip 21 and the second substrate 24 are connected, that is, the realization The packaging of the neural network chip 21.
  • the pin 26 is used to connect with the external circuit of the package structure (for example, the first substrate 13 on the neural network processor board 10), which can realize the transmission of external data and internal data, and is convenient for the neural network chip 21 or the neural network chip 21
  • the corresponding neural network processor processes the data.
  • the disclosure also does not limit the type and number of pins, and different pin forms can be selected according to different packaging technologies, and they are arranged in accordance with certain rules.
  • the above-mentioned neural network chip packaging structure further includes an insulating filler, which is placed in the gap between the pad 22, the solder ball 23 and the connection point 25 to prevent interference between the solder ball and the solder ball.
  • the material of the insulating filler may be silicon nitride, silicon oxide or silicon oxynitride; interference includes electromagnetic interference, inductive interference, etc.
  • the aforementioned neural network chip packaging structure further includes a heat dissipation device for dissipating heat when the neural network chip 21 is running.
  • the heat dissipation device may be a metal sheet, heat sink or heat sink with good thermal conductivity, for example, a fan.
  • the neural network chip packaging structure 11 includes: a neural network chip 21, pads 22, solder balls 23, a second substrate 24, connection points 25 on the second substrate 24, pins 26, Insulating filler 27, heat dissipation paste 28 and metal shell heat sink 29.
  • the heat dissipation paste 28 and the metal shell heat sink 29 are used to dissipate the heat of the neural network chip 21 during operation.
  • the aforementioned neural network chip packaging structure 11 further includes a reinforcing structure connected to the pad 22 and buried in the solder ball 23 to enhance the connection strength between the solder ball 23 and the pad 22.
  • the reinforcing structure may be a metal wire structure or a columnar structure, which is not limited herein.
  • the present disclosure does not limit the specific form of the first electrical and non-electrical device 12, and can refer to the description of the second electrical and non-electrical device 112, that is, the neural network chip packaging structure 11 is packaged by welding, or the connection can be used.
  • the method of connecting the second substrate 113 and the first substrate 13 by wire connection or plug-in connection facilitates the subsequent replacement of the first substrate 13 or the neural network chip packaging structure 11.
  • the first substrate 13 includes an interface for a memory unit for expanding storage capacity, such as: Synchronous Dynamic Random Access Memory (SDRAM), Double Date Rate SDRAM, DDR), etc., by expanding the memory to improve the processing capacity of the neural network processor.
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDR Double Date Rate SDRAM
  • the first substrate 13 may also include a Peripheral Component Interconnect-Express (PCI-E or PCIe) interface, a small form-factor pluggable (SFP) interface, an Ethernet interface, Controller Area Network (CAN) interfaces, etc., are used for data transmission between the package structure and external circuits, which can improve computing speed and operational convenience.
  • PCI-E or PCIe Peripheral Component Interconnect-Express
  • SFP small form-factor pluggable
  • Ethernet interface Ethernet interface
  • Controller Area Network (CAN) interfaces etc.
  • the neural network processor is packaged as a neural network chip 111, the neural network chip 111 is packaged as a neural network chip packaging structure 11, and the neural network chip packaging structure 11 is packaged as a neural network processor board card 10 through the interface on the board (
  • the socket or ferrule performs data interaction with an external circuit (for example, a computer motherboard), that is, directly realizes the function of the neural network processor by using the neural network processor board 10 and protects the neural network chip 111.
  • other modules can be added to the neural network processor board 10, which improves the application range and computing efficiency of the neural network processor.
  • the present disclosure discloses an electronic device, which includes the neural network processor board 10 or the neural network chip packaging structure 11 described above.
  • Electronic devices include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cameras, cameras, projectors, watches, headsets, mobile storage , Wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.

Abstract

L'invention concerne un procédé permettant d'ajuster la fréquence de quantification des données opérationnelles, ainsi qu'un produit associé. Le procédé consiste à ajuster la fréquence de quantification d'un réseau neuronal. Par conséquent, l'invention offre l'avantage d'une précision de calcul élevée.
PCT/CN2020/084943 2019-04-16 2020-04-15 Procédé d'ajustement de fréquence de quantification de données opérationnelles, et produit associé WO2020211783A1 (fr)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
CN201910307675.3A CN111832696A (zh) 2019-04-16 2019-04-16 神经网络运算方法及相关产品
CN201910306480.7 2019-04-16
CN201910307672.X 2019-04-16
CN201910306480.7A CN111832712A (zh) 2019-04-16 2019-04-16 运算数据的量化方法及相关产品
CN201910306479.4A CN111832711A (zh) 2019-04-16 2019-04-16 运算数据的量化方法及相关产品
CN201910306477.5A CN111832709A (zh) 2019-04-16 2019-04-16 运算数据的混合量化方法及相关产品
CN201910306477.5 2019-04-16
CN201910306478.X 2019-04-16
CN201910307672.XA CN111832695A (zh) 2019-04-16 2019-04-16 运算数据的量化精度调整方法及相关产品
CN201910306479.4 2019-04-16
CN201910306478.XA CN111832710A (zh) 2019-04-16 2019-04-16 运算数据的量化频率调整方法及相关产品
CN201910307675.3 2019-04-16

Publications (1)

Publication Number Publication Date
WO2020211783A1 true WO2020211783A1 (fr) 2020-10-22

Family

ID=72837027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084943 WO2020211783A1 (fr) 2019-04-16 2020-04-15 Procédé d'ajustement de fréquence de quantification de données opérationnelles, et produit associé

Country Status (1)

Country Link
WO (1) WO2020211783A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995593A (zh) * 2014-05-22 2014-08-20 无锡爱维特信息技术有限公司 基于加速度动作感应器的动态位置数据上传方法
CN104683804A (zh) * 2015-02-14 2015-06-03 北京航空航天大学 基于视频内容特征的参数自适应多维码率控制方法
US20180285736A1 (en) * 2017-04-04 2018-10-04 Hailo Technologies Ltd. Data Driven Quantization Optimization Of Weights And Input Data In An Artificial Neural Network
US20180341857A1 (en) * 2017-05-25 2018-11-29 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN109190754A (zh) * 2018-08-30 2019-01-11 北京地平线机器人技术研发有限公司 量化模型生成方法、装置和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995593A (zh) * 2014-05-22 2014-08-20 无锡爱维特信息技术有限公司 基于加速度动作感应器的动态位置数据上传方法
CN104683804A (zh) * 2015-02-14 2015-06-03 北京航空航天大学 基于视频内容特征的参数自适应多维码率控制方法
US20180285736A1 (en) * 2017-04-04 2018-10-04 Hailo Technologies Ltd. Data Driven Quantization Optimization Of Weights And Input Data In An Artificial Neural Network
US20180341857A1 (en) * 2017-05-25 2018-11-29 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN109190754A (zh) * 2018-08-30 2019-01-11 北京地平线机器人技术研发有限公司 量化模型生成方法、装置和电子设备

Similar Documents

Publication Publication Date Title
US20200104693A1 (en) Processing method and accelerating device
TWI793225B (zh) 神經網絡訓練方法及相關產品
TWI771539B (zh) 神經網絡運算設備和方法
US11748601B2 (en) Integrated circuit chip device
TWI791725B (zh) 神經網絡運算方法、集成電路芯片裝置及相關產品
TWI768159B (zh) 集成電路芯片裝置及相關產品
EP3770824A1 (fr) Procédé de calcul et produits associés de réseau neuronal récurrent
TWI767098B (zh) 神經網絡正向運算方法及相關產品
TWI793224B (zh) 集成電路芯片裝置及相關產品
WO2020211783A1 (fr) Procédé d'ajustement de fréquence de quantification de données opérationnelles, et produit associé
CN110490315B (zh) 神经网络的反向运算稀疏方法及相关产品
CN111832709A (zh) 运算数据的混合量化方法及相关产品
CN111832695A (zh) 运算数据的量化精度调整方法及相关产品
CN110490314B (zh) 神经网络的稀疏方法及相关产品
WO2019165946A1 (fr) Dispositif à microcircuit intégré, carte de circuit imprimé et produit associé
CN111832710A (zh) 运算数据的量化频率调整方法及相关产品
CN111832712A (zh) 运算数据的量化方法及相关产品
CN111832696A (zh) 神经网络运算方法及相关产品
CN111832711A (zh) 运算数据的量化方法及相关产品
TWI768160B (zh) 集成電路芯片裝置及相關產品
TWI767097B (zh) 集成電路芯片裝置及相關產品
TWI795482B (zh) 集成電路芯片裝置及相關產品
CN110472735A (zh) 神经网络的稀疏方法及相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20790521

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 210122)

122 Ep: pct application non-entry in european phase

Ref document number: 20790521

Country of ref document: EP

Kind code of ref document: A1