CN112712164A - Non-uniform quantization method of neural network - Google Patents

Non-uniform quantization method of neural network Download PDF

Info

Publication number
CN112712164A
CN112712164A CN202011616502.9A CN202011616502A CN112712164A CN 112712164 A CN112712164 A CN 112712164A CN 202011616502 A CN202011616502 A CN 202011616502A CN 112712164 A CN112712164 A CN 112712164A
Authority
CN
China
Prior art keywords
point number
fixed point
data
lookup table
uniform quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011616502.9A
Other languages
Chinese (zh)
Other versions
CN112712164B (en
Inventor
黄宇扬
冯建豪
陈家麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thinkforce Electronic Technology Co ltd
Original Assignee
Thinkforce Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thinkforce Electronic Technology Co ltd filed Critical Thinkforce Electronic Technology Co ltd
Priority to CN202011616502.9A priority Critical patent/CN112712164B/en
Publication of CN112712164A publication Critical patent/CN112712164A/en
Application granted granted Critical
Publication of CN112712164B publication Critical patent/CN112712164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06CDIGITAL COMPUTERS IN WHICH ALL THE COMPUTATION IS EFFECTED MECHANICALLY
    • G06C3/00Arrangements for table look-up, e.g. menstruation table
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Mathematics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a non-uniform quantization method of a neural network, which comprises the steps of firstly quantizing input data into a first fixed point number by adopting a piecewise function, storing the first fixed point number, searching a first lookup table, confirming a second fixed point number corresponding to the first fixed point number, wherein the bit number of the second fixed point number is higher than that of the first fixed point number, then performing convolution operation by adopting the second fixed point number to obtain a calculation result, searching the second lookup table, converting the calculation result into a third fixed point number for storing, and enabling the data type of the third fixed point number to be the same as that of the first fixed point number.

Description

Non-uniform quantization method of neural network
Technical Field
The invention relates to the technical field of neural networks, in particular to a non-uniform quantization method of a neural network.
Background
The application of artificial neural networks has made great progress in many aspects, for example, many practical problems which are difficult to solve by modern computers have been successfully solved in the fields of pattern recognition, intelligent robots, automatic control, prediction estimation, biology, medicine, economy and the like, and good intelligent characteristics are shown.
As model prediction (prediction) becomes more accurate and the network becomes deeper, the computational and memory resources consumed by neural networks become an issue, especially on mobile devices. For example, if a relatively small ResNet-50 network is deployed for classification, a memory bandwidth of 3GB/s is required for running a network model, and memory, a CPU and a battery are all consumed at a high speed when the network runs, so that the equipment becomes intelligent and needs to pay a high cost. As neural networks evolve, large neural networks have an increasing number of levels and data volumes, which present a significant challenge to the deployment of neural networks.
To address these issues, on the one hand, acceptable accuracy can be achieved with relatively small model sizes by designing more efficient network architectures; on the other hand, the network size can be reduced by compression, encoding, and the like. Quantization is one of the most widely used compression methods.
Neural network quantization can significantly improve the computational efficiency of neural networks, enabling the networks to be deployed on resource-limited chips or other computing platforms. Currently, the most common neural network quantization method projects high-precision floating point numbers into low-precision quantization values, for example, 32-bit floating point numbers are converted into 8-bit fixed point numbers, and 8-bit data streams are adopted to store parameters such as input and output data and weights, so as to reduce bandwidth requirements.
However, in the convolution operation, since the use of low-precision quantization values destroys the original data distribution of weights/activations in the neural network, the direct calculation using low-bit data may result in a great reduction in the network precision. In order to solve the problem, a non-uniform quantization algorithm is researched and proposed, although the non-uniform quantization algorithm can better maintain the network precision, the existing non-uniform quantization algorithm is complex, and a chip is difficult to load the calculated amount, so that the bandwidth requirement is not effectively reduced and the reasoning speed is not increased in the aspect of actual operation.
Disclosure of Invention
Aiming at partial and all problems in the prior art, the invention provides a non-uniform quantization method of a neural network, which comprises the following steps:
quantizing the input data into a first fixed point number by adopting a piecewise function, and storing the first fixed point number;
searching a first lookup table, and confirming a second fixed point number corresponding to the first fixed point number, wherein the bit number of the second fixed point number is higher than that of the first fixed point number;
performing convolution operation by adopting the second fixed point number to obtain a calculation result; and
and searching a second lookup table, converting the calculation result into a third fixed point number for storage, wherein the data type of the third fixed point number is the same as that of the first fixed point number.
Further, the data type of the first fixed point number is an 8-bit fixed point number.
Further, the first lookup table and the second lookup table are configured inside the chip.
Further, each segment function of the piecewise function is a linear function with different or same slope, and the slope is determined according to the upper and lower bounds of each segment function:
Figure BDA0002872308550000021
wherein the content of the first and second substances,
Figure BDA0002872308550000022
for characterizing [ ri1,ri2]The value range of the first fixed point number of the input data in the range.
Further, the piecewise function is:
Figure BDA0002872308550000023
wherein [ -f [ ]1,f1]K is an arbitrary number between 0 and 1, and is the value range of the input data r.
Further, the forming of the first lookup table includes:
confirming a value range of input data, and quantizing the data in the value range according to the piecewise function to obtain first data;
dequantizing the first data to a floating point number according to the piecewise function;
quantizing the floating point number to a second data; and
and configuring the first data and the corresponding second data into a first lookup table.
Further, the forming of the second lookup table includes:
estimating the value range of the convolution calculation result;
establishing a mapping table from a low specific point number to a high specific point number in the value range; and
for each possible high-bit fixed-point value, finding the nearest mapped number in the mapping table to form a second lookup table.
In the embodiment of the present invention, the low-bit fixed point number refers to a data type used for storing input data and a calculation result, and the high-bit fixed point number refers to a data type with a bit number higher than that of the low-bit fixed point number, and is mainly used for convolution operation.
According to the non-uniform quantization method of the neural network, the input data of each layer is stored in a simple non-uniform quantization mode, and when the most core convolution operation of the neural network is performed, a higher bit value is searched upwards in a chip through a pre-configured lookup table, and then high bit calculation is performed, so that the bandwidth and the burden of calculation resources are not greatly increased while the precision is ensured. According to the method, the lookup table is configured in advance through off-chip calculation, so that all operations of high-precision neural network training or reasoning can be performed in the neural network chip without floating point number calculation, the calculation force requirement on the main control CPU is reduced, and the chip resources are saved.
Drawings
To further clarify the above and other advantages and features of embodiments of the present invention, a more particular description of embodiments of the present invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, the same or corresponding parts will be denoted by the same or similar reference numerals for clarity.
Fig. 1 is a flow chart illustrating a non-uniform quantization method of a neural network according to an embodiment of the present invention.
Detailed Description
In the following description, the present invention is described with reference to examples. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods, materials, or components. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention. Similarly, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the embodiments of the invention. However, the invention is not limited to these specific details. Further, it should be understood that the embodiments shown in the figures are illustrative representations and are not necessarily drawn to scale.
Reference in the specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
It should be noted that the embodiment of the present invention describes the process steps in a specific order, however, this is only for the purpose of illustrating the specific embodiment, and does not limit the sequence of the steps. Rather, in various embodiments of the present invention, the order of the steps may be adjusted according to process adjustments.
In order to reduce bandwidth requirements, low bit numbers, e.g., 8-bit data streams, are often used in neural networks to store data, weights, etc. However, in the convolution operation of the most core of the neural network, if a low bit number is used for calculation, the accuracy of the network is easily lost, and usually, the low bit number needs to be inversely quantized into a floating point number before accurate calculation can be performed. In the convolution operation, the matrix is generally expanded according to the size of a convolution kernel, and then matrix multiplication is performed, wherein the calculation of the kernel is to perform dot product operation on rows and columns in the matrix. The dot product operation acts on two equal length vectors, which can be described by the following formula:
Figure BDA0002872308550000041
the length of a and b is n, and the length of C is the result of dot product, so it can be seen that if the floating point number is used for dot product operation, the resource consumption is more, and the occupied bandwidth is larger, which is easy to become the bottleneck of calculation. In view of this problem, the present invention provides a non-uniform quantization method for neural networks, and the scheme of the present invention is further described below with reference to the accompanying drawings of embodiments.
Fixed-point and floating-point are both numerical representations, which differ in the location of the point separating the integer part from the fractional part. Fixed points hold integers and decimals of a particular number of bits, while floating points hold significant digits and exponents of a particular number of bits. Taking 8-bit fixed point integer INT8 and 32-bit floating point FP32 as examples, INT8 uses only 25% of bits of FP 32. However, the method of converting values between INT8 and FP32 is very important because it significantly affects the prediction accuracy.
Fig. 1 is a flow chart illustrating a non-uniform quantization method of a neural network according to an embodiment of the present invention. As shown in fig. 1, a non-uniform quantization method of a neural network includes:
first, in step 101, a look-up table is configured. In order to reduce the amount of calculation in the neural network chip, the corresponding relation between the low bit count and the high bit count is calculated in advance outside the chip to form a first lookup table and a second lookup table, and the first lookup table and the second lookup table are configured in the chip. The first lookup table is used for converting input data in the form of low fixed point numbers into high fixed point numbers, and the second lookup table is used for converting calculation results in the form of high fixed point numbers into low fixed point numbers. In one embodiment of the invention, the forming of the first lookup table comprises:
firstly, confirming a value range of input data, and quantizing the data in the value range according to a piecewise function to obtain first data; typically, the input data is in the form of 32-bit floating point numbers, and in one embodiment of the invention, the data type of the first data is 8-bit floating point numbers; in a further embodiment of the present invention, the segments of the piecewise function are linear functions with the same or different slopes:
Figure BDA0002872308550000051
wherein the content of the first and second substances,
Figure BDA0002872308550000052
for the value range of the input data in the segment interval, and [ qi1,qi2]For characterizing [ ri1,ri2]In an embodiment of the present invention, the piecewise function includes three segments:
Figure BDA0002872308550000053
wherein [ -f [ ]1,f1]K is any number between 0 and 1 for the value range of the input data r, and in one embodiment of the present invention, the value is 0.125, and the value range of q is [0,2 ]n-1]Wherein n is the number of bits of the first data, and considering that the input data is more distributed in the middle section of the value range, in the piecewise function, scale2Is less than scale1、scale3That is, in the second segment interval, the representation is performed using more points, and the q value range of each segment interval is [0,63 ] taking 8 bits as an example],[64,191]And [192,25 ]5](ii) a It should be understood that in other embodiments of the present invention, the segmentation interval and/or the number of segmentation segments of the segmentation function may be set differently according to the distribution of the input data;
next, inverse quantizing the first data into floating point numbers using an inverse of the piecewise function;
then, quantizing the floating point number into second data, wherein the bit number of the second data is higher than that of the first data, and the bit number of the second data is determined according to requirements after comprehensively considering the chip area and the quantization performance, such as 10-12 bit fixed point numbers; in one embodiment of the invention, the floating point number is quantized into the second data using a uniform quantization algorithm, in particular a linear function. The original floating point number r and the quantized numerical value q have the following relations:
Figure BDA0002872308550000061
r=scale*(q-zeroPoint);
wherein, scale and zeroPoint are obtained by calculating the upper and lower bounds of the quantization interval, in one embodiment of the invention, the range of the floating point number to be quantized is assumed to be [ -f [ ]1,f1]The range of the second data is [0, i ]1]Then, there are:
Figure BDA0002872308550000062
Figure BDA0002872308550000063
and
and finally, configuring the first data and the corresponding second data into a first lookup table. In one embodiment of the invention, the forming of the second lookup table comprises:
firstly, estimating the value range of a neural network convolution calculation result;
then, in the value range, establishing a mapping table from a low specific point number to a high specific point number, wherein the specific establishment method is the same as that of the first lookup table, the low specific point number refers to a data type adopted when storing input data and a calculation result, namely the data type is the same as that of the first data, and the high specific point number refers to a data type with a bit number higher than that of the low specific point number, is used for convolution operation, and is the same as that of the second data; and
finally, for each possible high-bit fixed point value, finding the nearest mapped number in the mapping table, and establishing a second lookup table;
next, at step 102, the input data is stored. Quantizing the input data into a first fixed point number by adopting a piecewise function, and storing the first fixed point number; in one embodiment of the present invention, each segment of the piecewise function is a linear function with the same or different slope, and the slope of the function is determined according to the value range of the input data of each segment and the data type of the first fixed point number; in yet another embodiment of the present invention, the data type of the first fixed point number is an 8-bit fixed point number;
next, at step 103, a high bit count number is looked up. Confirming a second fixed point number corresponding to the first fixed point number through the first lookup table, wherein the number of bits of the second fixed point number is higher than that of the first fixed point number;
next, at step 104, a convolution operation is performed. Performing convolution operation by adopting the second fixed point number to obtain a calculation result; because the input data has been converted into the fixed point number of the high bit at this moment, reduced the loss of the neural network precision on the one hand, on the other hand, convert the dot product operation for the multiply-add operation of pure fixed point again, go on in the integer level completely, greatly reduced the bandwidth demand:
Figure BDA0002872308550000071
wherein, scalea、scaleb、scalec、zeroPointa、zeroPointb、zeroPointcIs a quantization parameter; and
finally, in step 105, the calculation results are stored. And converting the convolution calculation result into a third fixed point number for storage by searching the second lookup table, wherein the data type of the third fixed point number is the same as that of the first fixed point number.
According to the non-uniform quantization method of the neural network, the input data of each layer is stored in a simple non-uniform quantization mode, and when the most core convolution operation of the neural network is performed, a higher bit value is searched upwards in a chip through a pre-configured lookup table, and then high bit calculation is performed, so that the bandwidth and the burden of calculation resources are not greatly increased while the precision is ensured.
Embodiments may be provided as a computer program product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines performing operations in accordance with embodiments of the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc read-only memories), and magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read-only memories), EEPROMs (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection). Accordingly, a machine-readable medium as used herein may include, but is not required to be, such a carrier wave.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various combinations, modifications, and changes can be made thereto without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (10)

1. A method for non-uniform quantization of a neural network, comprising the steps of:
quantizing the input data into a first fixed point number by adopting a piecewise function, and storing the first fixed point number;
searching a first lookup table, and confirming a second fixed point number corresponding to the first fixed point number, wherein the bit number of the second fixed point number is higher than that of the first fixed point number;
performing convolution operation by adopting the second fixed point number to obtain a calculation result; and
and searching a second lookup table, converting the calculation result into a third fixed point number for storage, wherein the data type of the third fixed point number is the same as that of the first fixed point number.
2. The non-uniform quantization method of claim 1, wherein the data type of the first fixed-point number is an 8-bit fixed-point number.
3. The non-uniform quantization method of claim 1, wherein the first lookup table and the second lookup table are configured within a chip.
4. The non-uniform quantization method of claim 1, wherein each segment function of the piecewise function is a linear function with different or same slope:
Figure FDA0002872308540000011
wherein the content of the first and second substances,
Figure FDA0002872308540000012
[qi1,qi2]for characterizing [ ri1,ri2]The value range of the first fixed point number of the input data in the range.
5. The non-uniform quantization method of claim 4, wherein the piecewise function is:
Figure FDA0002872308540000013
wherein [ -f [ ]1,f1]K is an arbitrary number between 0 and 1, and is the value range of the input data r.
6. The non-uniform quantization method of claim 1, wherein the forming of the first lookup table comprises the steps of:
confirming a value range of input data, and quantizing the data in the value range according to the piecewise function to obtain first data;
dequantizing the first data to a floating point number according to the piecewise function;
quantizing the floating point number to a second data; and
and configuring the first data and the corresponding second data into a first lookup table.
7. The non-uniform quantization method of claim 6, wherein the floating point quantization is performed on the second data by using a linear function to quantize the floating point, and wherein the original floating point r and the quantized second data q have the following relationships:
Figure FDA0002872308540000021
r=scale*(q-zeroPoint);
wherein, scale and zeroPoint are calculated from the upper and lower bounds of the quantization interval.
8. The non-uniform quantization method of claim 1, wherein the forming of the second lookup table comprises the steps of:
estimating the value range of the convolution calculation result;
establishing a mapping table from a low specific point number to a high specific point number in the value range; and
for each possible high-bit fixed-point value, finding the nearest mapped number in the mapping table to form a second lookup table.
9. A computer-readable storage medium comprising instructions that, when executed, cause a system to perform the method of any of claims 1-8.
10. A non-uniform quantization system for neural networks, comprising:
a memory; and
a processor coupled to the memory and configured to perform the method of any of claims 1-8.
CN202011616502.9A 2020-12-30 2020-12-30 Non-uniform quantization method of neural network Active CN112712164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011616502.9A CN112712164B (en) 2020-12-30 2020-12-30 Non-uniform quantization method of neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011616502.9A CN112712164B (en) 2020-12-30 2020-12-30 Non-uniform quantization method of neural network

Publications (2)

Publication Number Publication Date
CN112712164A true CN112712164A (en) 2021-04-27
CN112712164B CN112712164B (en) 2022-08-26

Family

ID=75547390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011616502.9A Active CN112712164B (en) 2020-12-30 2020-12-30 Non-uniform quantization method of neural network

Country Status (1)

Country Link
CN (1) CN112712164B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022236588A1 (en) * 2021-05-10 2022-11-17 Huawei Technologies Co., Ltd. Methods and systems for generating integer neural network from a full-precision neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034796A1 (en) * 2017-07-28 2019-01-31 Beijing Deephi Intelligence Technology Co., Ltd. Fixed-point training method for deep neural networks based on static fixed-point conversion scheme
CN110852434A (en) * 2019-09-30 2020-02-28 成都恒创新星科技有限公司 CNN quantization method, forward calculation method and device based on low-precision floating point number
CN110929865A (en) * 2018-09-19 2020-03-27 深圳云天励飞技术有限公司 Network quantification method, service processing method and related product
US20200151019A1 (en) * 2019-03-14 2020-05-14 Rednova Innovations,Inc. OPU-based CNN acceleration method and system
US20200242474A1 (en) * 2019-01-24 2020-07-30 Microsoft Technology Licensing, Llc Neural network activation compression with non-uniform mantissas

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034796A1 (en) * 2017-07-28 2019-01-31 Beijing Deephi Intelligence Technology Co., Ltd. Fixed-point training method for deep neural networks based on static fixed-point conversion scheme
CN110929865A (en) * 2018-09-19 2020-03-27 深圳云天励飞技术有限公司 Network quantification method, service processing method and related product
US20200242474A1 (en) * 2019-01-24 2020-07-30 Microsoft Technology Licensing, Llc Neural network activation compression with non-uniform mantissas
US20200151019A1 (en) * 2019-03-14 2020-05-14 Rednova Innovations,Inc. OPU-based CNN acceleration method and system
CN110852434A (en) * 2019-09-30 2020-02-28 成都恒创新星科技有限公司 CNN quantization method, forward calculation method and device based on low-precision floating point number

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DARRYL D.LIN, ET AL: "Fixed Point Quantization of Deep Convolutional Networks", 《AVXIV》 *
魏苗 等: "基于FPGA的1080P低质视频实时增强系统", 《计算机技术与发展 》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022236588A1 (en) * 2021-05-10 2022-11-17 Huawei Technologies Co., Ltd. Methods and systems for generating integer neural network from a full-precision neural network

Also Published As

Publication number Publication date
CN112712164B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN110363279B (en) Image processing method and device based on convolutional neural network model
CN107480770B (en) Neural network quantization and compression method and device capable of adjusting quantization bit width
CN109800865B (en) Neural network generation and image processing method and device, platform and electronic equipment
CN111416743B (en) Convolutional network accelerator, configuration method and computer readable storage medium
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
CN110929865B (en) Network quantification method, service processing method and related product
CN110880038A (en) System for accelerating convolution calculation based on FPGA and convolution neural network
WO2023279964A1 (en) Data compression method and apparatus, and computing device and storage medium
CN115599757A (en) Data compression method and device, computing equipment and storage system
CN112712164B (en) Non-uniform quantization method of neural network
CN111240746A (en) Floating point data inverse quantization and quantization method and equipment
CN111754405A (en) Image resolution reduction and restoration method, equipment and readable storage medium
US20230131251A1 (en) System and method for memory compression for deep learning networks
CN114612996A (en) Method for operating neural network model, medium, program product, and electronic device
CN114529741A (en) Picture duplicate removal method and device and electronic equipment
CN116502691A (en) Deep convolutional neural network mixed precision quantization method applied to FPGA
CN112085175B (en) Data processing method and device based on neural network calculation
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
CN111401546A (en) Training method of neural network model, medium thereof, and electronic device
CN111344719A (en) Data processing method and device based on deep neural network and mobile device
CN113962385A (en) Neural network training and data processing method and device, medium and computer equipment
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN114169513B (en) Neural network quantization method and device, storage medium and electronic equipment
CN113177627B (en) Optimization system, retraining system, method thereof, processor and readable medium
CN110276448B (en) Model compression method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant