CN112712164B - Non-uniform quantization method of neural network - Google Patents

Non-uniform quantization method of neural network Download PDF

Info

Publication number
CN112712164B
CN112712164B CN202011616502.9A CN202011616502A CN112712164B CN 112712164 B CN112712164 B CN 112712164B CN 202011616502 A CN202011616502 A CN 202011616502A CN 112712164 B CN112712164 B CN 112712164B
Authority
CN
China
Prior art keywords
point number
fixed point
data
lookup table
uniform quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011616502.9A
Other languages
Chinese (zh)
Other versions
CN112712164A (en
Inventor
黄宇扬
冯建豪
陈家麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thinkforce Electronic Technology Co ltd
Original Assignee
Thinkforce Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thinkforce Electronic Technology Co ltd filed Critical Thinkforce Electronic Technology Co ltd
Priority to CN202011616502.9A priority Critical patent/CN112712164B/en
Publication of CN112712164A publication Critical patent/CN112712164A/en
Application granted granted Critical
Publication of CN112712164B publication Critical patent/CN112712164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06CDIGITAL COMPUTERS IN WHICH ALL THE COMPUTATION IS EFFECTED MECHANICALLY
    • G06C3/00Arrangements for table look-up, e.g. menstruation table
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a non-uniform quantization method of a neural network, which comprises the steps of firstly quantizing input data into a first fixed point number by adopting a piecewise function, storing the first fixed point number, searching a first lookup table, confirming a second fixed point number corresponding to the first fixed point number, wherein the bit number of the second fixed point number is higher than that of the first fixed point number, then performing convolution operation by adopting the second fixed point number to obtain a calculation result, searching the second lookup table, converting the calculation result into a third fixed point number for storing, and enabling the data type of the third fixed point number to be the same as that of the first fixed point number.

Description

Non-uniform quantization method of neural network
Technical Field
The invention relates to the technical field of neural networks, in particular to a non-uniform quantization method of a neural network.
Background
The application of the artificial neural network has made great progress in many aspects, for example, the practical problems which are difficult to solve by many modern computers have been successfully solved in the fields of pattern recognition, intelligent robot, automatic control, prediction estimation, biology, medicine, economy and the like, and the artificial neural network shows good intelligent characteristics.
As model prediction (prediction) becomes more accurate and the network becomes deeper, the computational and memory resources consumed by neural networks become an issue, especially on mobile devices. For example, if a relatively small ResNet-50 network is deployed for classification, a memory bandwidth of 3GB/s is required for running a network model, and when the network runs, a memory, a CPU and a battery are all consumed at a high speed, so that the device becomes intelligent at a high cost. As neural networks evolve, large neural networks have an increasing number of levels and data volumes, which present a significant challenge to the deployment of neural networks.
To address these issues, on the one hand, acceptable accuracy can be achieved with relatively small model sizes by designing more efficient network architectures; on the other hand, the network size can be reduced by compression, encoding, and the like. Quantization is one of the most widely used compression methods.
Neural network quantization can significantly improve the computational efficiency of neural networks, enabling the networks to be deployed on resource-limited chips or other computing platforms. Currently, the most common neural network quantization method projects high-precision floating point numbers into low-precision quantization values, for example, 32-bit floating point numbers are converted into 8-bit fixed point numbers, and 8-bit data streams are adopted to store parameters such as input and output data and weights, so as to reduce bandwidth requirements.
However, in the convolution operation, since the use of low-precision quantization values destroys the original data distribution of weights/activations in the neural network, the direct calculation using low-bit data may result in a great reduction in network precision. In order to solve the problem, a non-uniform quantization algorithm is researched and proposed, although the non-uniform quantization algorithm can better maintain the network precision, the existing non-uniform quantization algorithm is complex, and a chip is difficult to load the calculated amount, so that the bandwidth requirement is not effectively reduced and the reasoning speed is not increased in the aspect of actual operation.
Disclosure of Invention
Aiming at partial and all problems in the prior art, the invention provides a non-uniform quantization method of a neural network, which comprises the following steps:
quantizing the input data into a first fixed point number by adopting a piecewise function, and storing the first fixed point number;
searching a first lookup table, and confirming a second fixed point number corresponding to the first fixed point number, wherein the bit number of the second fixed point number is higher than that of the first fixed point number;
performing convolution operation by adopting the second fixed point number to obtain a calculation result; and
and searching a second lookup table, converting the calculation result into a third fixed point number for storage, wherein the data type of the third fixed point number is the same as that of the first fixed point number.
Further, the data type of the first fixed point number is an 8-bit fixed point number.
Further, the first lookup table and the second lookup table are configured inside the chip.
Further, each segment function of the piecewise function is a linear function with different or same slope, and the slope is determined according to the upper and lower bounds of each segment function:
Figure BDA0002872308550000021
wherein the content of the first and second substances,
Figure BDA0002872308550000022
for characterizing [ r i1 ,r i2 ]The value range of the first fixed point number of the input data in the range.
Further, the piecewise function is:
Figure BDA0002872308550000023
wherein [ -f [ ] 1 ,f 1 ]K is an arbitrary number between 0 and 1, and is the value range of the input data r.
Further, the forming of the first lookup table includes:
confirming a value range of input data, and quantizing the data in the value range according to the piecewise function to obtain first data;
dequantizing the first data to a floating point number according to the piecewise function;
quantizing the floating point number to a second data; and
and configuring the first data and the corresponding second data into a first lookup table.
Further, the forming of the second lookup table includes:
estimating the value range of the convolution calculation result;
establishing a mapping table from a low specific point number to a high specific point number in the value range; and
for each possible high-bit fixed-point value, finding the nearest mapped number in the mapping table to form a second lookup table.
In the embodiment of the invention, the low fixed point number refers to a data type used for storing input data and calculation results, and the high fixed point number refers to a data type with a bit number higher than that of the low fixed point number, and is mainly used for convolution operation.
According to the non-uniform quantization method of the neural network, the input data of each layer is stored in a simple non-uniform quantization mode, and when the most core convolution operation of the neural network is performed, a higher bit value is searched upwards in a chip through a pre-configured lookup table, and then high bit calculation is performed, so that the bandwidth and the burden of calculation resources are not greatly increased while the precision is ensured. According to the method, the lookup table is configured in advance through off-chip calculation, so that all operations of high-precision neural network training or reasoning can be performed in the neural network chip without floating point number calculation, the calculation force requirement on the main control CPU is reduced, and the chip resources are saved.
Drawings
To further clarify the above and other advantages and features of embodiments of the present invention, a more particular description of embodiments of the present invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, the same or corresponding parts will be denoted by the same or similar reference numerals for clarity.
Fig. 1 is a flow chart illustrating a non-uniform quantization method of a neural network according to an embodiment of the present invention.
Detailed Description
In the following description, the present invention is described with reference to examples. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods, materials, or components. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention. Similarly, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the embodiments of the invention. However, the invention is not limited to these specific details. Further, it should be understood that the embodiments shown in the figures are illustrative representations and are not necessarily drawn to scale.
Reference in the specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
It should be noted that the embodiment of the present invention describes the process steps in a specific order, however, this is only for the purpose of illustrating the specific embodiment, and does not limit the sequence of the steps. Rather, in various embodiments of the present invention, the order of the steps may be adjusted according to process adjustments.
In order to reduce bandwidth requirements, low bit numbers, e.g., 8-bit data streams, are often used in neural networks to store data, weights, etc. However, in the convolution operation of the most core of the neural network, if a low bit number is used for calculation, the accuracy of the network is easily lost, and usually, the low bit number needs to be inversely quantized into a floating point number before accurate calculation can be performed. In convolution operation, the matrix is usually expanded according to the size of convolution kernel, and matrix multiplication operation is performed after expansion, where the calculation of kernel is dot product operation performed on rows and columns in the matrix. The dot product operation acts on two equal length vectors, which can be described by the following formula:
Figure BDA0002872308550000041
the length of a and b is n, and the length of C is the result of dot product, so it can be seen that if the floating point number is used for dot product operation, the resource consumption is more, and the occupied bandwidth is larger, which is easy to become the bottleneck of calculation. In view of this problem, the present invention provides a non-uniform quantization method for neural networks, and the scheme of the present invention is further described below with reference to the accompanying drawings of embodiments.
Fixed-point and floating-point are both numerical representations, which differ in the location of the point separating the integer part from the fractional part. Fixed points hold integers and decimals of a particular number of bits, while floating points hold significant digits and exponents of a particular number of bits. Taking 8-bit fixed point integer INT8 and 32-bit floating point FP32 as examples, INT8 uses only 25% of bits of FP 32. However, the method of converting values between INT8 and FP32 is very important because it significantly affects the prediction accuracy.
Fig. 1 is a flow chart illustrating a non-uniform quantization method of a neural network according to an embodiment of the present invention. As shown in fig. 1, a non-uniform quantization method of a neural network includes:
first, in step 101, a look-up table is configured. In order to reduce the amount of calculation in the neural network chip, the corresponding relation between the low bit count and the high bit count is calculated in advance outside the chip to form a first lookup table and a second lookup table, and the first lookup table and the second lookup table are configured in the chip. The first lookup table is used for converting input data in the form of low fixed point numbers into high fixed point numbers, and the second lookup table is used for converting calculation results in the form of high fixed point numbers into low fixed point numbers. In one embodiment of the invention, the forming of the first lookup table comprises:
firstly, confirming a value range of input data, and quantizing the data in the value range according to a piecewise function to obtain first data; typically, the input data is in the form of 32-bit floating point numbers, and in one embodiment of the invention, the data type of the first data is 8-bit floating point numbers; in a further embodiment of the present invention, the segments of the piecewise function are linear functions with the same or different slopes:
Figure BDA0002872308550000051
wherein the content of the first and second substances,
Figure BDA0002872308550000052
for the value range of the input data in the segment interval, and [ q i1 ,q i2 ]For characterizing [ r i1 ,r i2 ]In an embodiment of the present invention, the piecewise function includes three segments:
Figure BDA0002872308550000053
wherein [ -f [ ] 1 ,f 1 ]K is any number between 0 and 1 for the value range of the input data r, and in one embodiment of the present invention, the value is 0.125, and the value range of q is [0,2 ] n -1]Wherein n is the number of bits of the first data, and considering that the input data is more distributed in the middle section of the value range, in the piecewise function, scale 2 Is less than scale 1 、scale 3 That is, in the second segment interval, the representation is performed using more points, and the q value range of each segment interval is [0,63 ] taking 8 bits as an example],[64,191]And [192,255](ii) a It should be understood that in other embodiments of the present invention, the segmentation interval and/or the number of segmentation segments of the segmentation function may be set differently according to the distribution of the input data;
next, inverse quantizing the first data into floating point numbers by using an inverse function of the piecewise function;
then, quantizing the floating point number into second data, wherein the bit number of the second data is higher than that of the first data, and the bit number of the second data is determined according to requirements after comprehensively considering the chip area and the quantization performance, such as 10-12 bit fixed point numbers; in one embodiment of the invention, the floating point number is quantized into the second data using a uniform quantization algorithm, in particular a linear function. The original floating point number r and the quantized numerical value q have the following relations:
Figure BDA0002872308550000061
r=scale*(q-zeroPoint);
wherein, scale and zeroPoint are obtained by calculating the upper and lower bounds of the quantization interval, in one embodiment of the invention, the range of the floating point number to be quantized is assumed to be [ -f [ ] 1 ,f 1 ]The range of the second data is [0, i ] 1 ]Then, there are:
Figure BDA0002872308550000062
Figure BDA0002872308550000063
and
and finally, configuring the first data and the corresponding second data into a first lookup table. In one embodiment of the invention, the forming of the second lookup table comprises:
firstly, estimating the value range of a neural network convolution calculation result;
then, in the value range, establishing a mapping table from a low specific point number to a high specific point number, wherein the specific establishment method is the same as that of the first lookup table, the low specific point number refers to a data type adopted when storing input data and a calculation result, namely the data type is the same as that of the first data, and the high specific point number refers to a data type with a bit number higher than that of the low specific point number, is used for convolution operation, and is the same as that of the second data; and
finally, for each possible high-bit fixed point value, finding the nearest mapped number in the mapping table, and establishing a second lookup table;
next, at step 102, the input data is stored. Quantizing the input data into a first fixed point number by adopting a piecewise function, and storing the first fixed point number; in one embodiment of the present invention, each segment of the piecewise function is a linear function with the same or different slope, and the slope of the function is determined according to the value range of the input data of each segment and the data type of the first fixed point number; in yet another embodiment of the present invention, the data type of the first fixed point number is an 8-bit fixed point number;
next, at step 103, a high bit count number is looked up. Confirming a second fixed point number corresponding to the first fixed point number through the first lookup table, wherein the number of bits of the second fixed point number is higher than that of the first fixed point number;
next, at step 104, a convolution operation is performed. Performing convolution operation by adopting the second fixed point number to obtain a calculation result; because the input data has been converted into the fixed point number of the high bit at this moment, reduced the loss of the neural network precision on the one hand, on the other hand, convert the dot product operation for the multiply-add operation of pure fixed point again, go on in the integer level completely, greatly reduced the bandwidth demand:
Figure BDA0002872308550000071
wherein, scale a 、scale b 、scale c 、zeroPoint a 、zeroPoint b 、zeroPoint c Is a quantization parameter; and
finally, in step 105, the calculation results are stored. And converting the convolution calculation result into a third fixed point number for storage by searching the second lookup table, wherein the data type of the third fixed point number is the same as that of the first fixed point number.
According to the non-uniform quantization method of the neural network, the input data of each layer is stored in a simple non-uniform quantization mode, and when the most core convolution operation of the neural network is performed, a higher bit value is searched upwards in a chip through a pre-configured lookup table, and then high bit calculation is performed, so that the bandwidth and the burden of calculation resources are not greatly increased while the precision is ensured.
Embodiments may be provided as a computer program product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines performing operations in accordance with embodiments of the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc read-only memories), and magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read-only memories), EEPROMs (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection). Accordingly, a machine-readable medium as used herein may include, but is not required to be, such a carrier wave.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various combinations, modifications, and changes can be made thereto without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (9)

1. A method for non-uniform quantization of a neural network, comprising the steps of:
quantizing the input data into a first fixed point number by adopting a piecewise function, and storing;
searching a first lookup table, and confirming a second fixed point number corresponding to the first fixed point number, wherein the bit number of the second fixed point number is higher than that of the first fixed point number;
performing convolution operation by adopting the second fixed point number to obtain a calculation result; and
and searching a second lookup table, converting the calculation result into a third fixed point number for storage, wherein the data type of the third fixed point number is the same as that of the first fixed point number, and the first lookup table and the second lookup table are configured in the chip.
2. The non-uniform quantization method of claim 1, wherein the data type of the first fixed-point number is an 8-bit fixed-point number.
3. The non-uniform quantization method of claim 1, wherein each segment function of the piecewise function is a linear function with different or same slope:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
,
Figure DEST_PATH_IMAGE008
for characterizing
Figure DEST_PATH_IMAGE010
The value range of the first fixed point number of the input data in the range.
4. The non-uniform quantization method of claim 3, wherein the piecewise function is:
Figure DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE014
k is an arbitrary number between 0 and 1, and is the value range of the input data r.
5. The non-uniform quantization method of claim 1, wherein the forming of the first lookup table comprises the steps of:
confirming a value range of input data, and quantizing the data in the value range according to the piecewise function to obtain first data;
dequantizing the first data to a floating point number according to the piecewise function;
quantizing the floating point number to a second data; and
and configuring the first data and the corresponding second data into a first lookup table.
6. The non-uniform quantization method of claim 5, wherein the floating point quantization is performed on the second data using a linear function to quantize the floating point, and wherein the original floating point r and the quantized second data q have the following relationships:
Figure DEST_PATH_IMAGE016
wherein, scale and zeroPoint are calculated from the upper and lower bounds of the quantization interval.
7. The non-uniform quantization method of claim 1, wherein the forming of the second lookup table comprises the steps of:
estimating the value range of the convolution calculation result;
establishing a mapping table from a low specific point number to a high specific point number in the value range; and
for each possible high-bit fixed-point value, the most recent mapped number is found in the mapping table to form a second lookup table.
8. A computer-readable storage medium comprising instructions that, when executed, cause a system to perform the method of any of claims 1-7.
9. A non-uniform quantization system for neural networks, comprising:
a memory; and
a processor coupled to the memory and configured to perform the method of any of claims 1-7.
CN202011616502.9A 2020-12-30 2020-12-30 Non-uniform quantization method of neural network Active CN112712164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011616502.9A CN112712164B (en) 2020-12-30 2020-12-30 Non-uniform quantization method of neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011616502.9A CN112712164B (en) 2020-12-30 2020-12-30 Non-uniform quantization method of neural network

Publications (2)

Publication Number Publication Date
CN112712164A CN112712164A (en) 2021-04-27
CN112712164B true CN112712164B (en) 2022-08-26

Family

ID=75547390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011616502.9A Active CN112712164B (en) 2020-12-30 2020-12-30 Non-uniform quantization method of neural network

Country Status (1)

Country Link
CN (1) CN112712164B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022236588A1 (en) * 2021-05-10 2022-11-17 Huawei Technologies Co., Ltd. Methods and systems for generating integer neural network from a full-precision neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852434A (en) * 2019-09-30 2020-02-28 成都恒创新星科技有限公司 CNN quantization method, forward calculation method and device based on low-precision floating point number
CN110929865A (en) * 2018-09-19 2020-03-27 深圳云天励飞技术有限公司 Network quantification method, service processing method and related product

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679618B (en) * 2017-07-28 2021-06-11 赛灵思电子科技(北京)有限公司 Static strategy fixed-point training method and device
US11562247B2 (en) * 2019-01-24 2023-01-24 Microsoft Technology Licensing, Llc Neural network activation compression with non-uniform mantissas
CN110058883B (en) * 2019-03-14 2023-06-16 梁磊 CNN acceleration method and system based on OPU

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929865A (en) * 2018-09-19 2020-03-27 深圳云天励飞技术有限公司 Network quantification method, service processing method and related product
CN110852434A (en) * 2019-09-30 2020-02-28 成都恒创新星科技有限公司 CNN quantization method, forward calculation method and device based on low-precision floating point number

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fixed Point Quantization of Deep Convolutional Networks;Darryl D.Lin, et al;《avXiv》;20160602;全文 *
基于FPGA的1080P低质视频实时增强系统;魏苗 等;《计算机技术与发展 》;20170605;全文 *

Also Published As

Publication number Publication date
CN112712164A (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
CN109472353B (en) Convolutional neural network quantization circuit and method
CN107480770B (en) Neural network quantization and compression method and device capable of adjusting quantization bit width
CN110363279B (en) Image processing method and device based on convolutional neural network model
CN109800865B (en) Neural network generation and image processing method and device, platform and electronic equipment
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
CN110555508A (en) Artificial neural network adjusting method and device
CN110880038A (en) System for accelerating convolution calculation based on FPGA and convolution neural network
WO2023279964A1 (en) Data compression method and apparatus, and computing device and storage medium
CN115599757A (en) Data compression method and device, computing equipment and storage system
CN111814973B (en) Memory computing system suitable for neural ordinary differential equation network computing
US12014273B2 (en) Low precision and coarse-to-fine dynamic fixed-point quantization design in convolution neural network
KR20200093404A (en) Neural network accelerator and operating method thereof
CN112712164B (en) Non-uniform quantization method of neural network
US11960986B2 (en) Neural network accelerator and operating method thereof
CN111240746A (en) Floating point data inverse quantization and quantization method and equipment
CN114612996A (en) Method for operating neural network model, medium, program product, and electronic device
CN107070463B (en) Efficient construction method of polarization code
CN114529741A (en) Picture duplicate removal method and device and electronic equipment
CN113660113A (en) Self-adaptive sparse parameter model design and quantitative transmission method for distributed machine learning
CN116502691A (en) Deep convolutional neural network mixed precision quantization method applied to FPGA
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
CN111344719A (en) Data processing method and device based on deep neural network and mobile device
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN114169513B (en) Neural network quantization method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant