CN114756517A - Visual Transformer compression method and system based on micro-quantization training - Google Patents
Visual Transformer compression method and system based on micro-quantization training Download PDFInfo
- Publication number
- CN114756517A CN114756517A CN202210295189.6A CN202210295189A CN114756517A CN 114756517 A CN114756517 A CN 114756517A CN 202210295189 A CN202210295189 A CN 202210295189A CN 114756517 A CN114756517 A CN 114756517A
- Authority
- CN
- China
- Prior art keywords
- quantization
- micro
- training
- layer
- alpha
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 178
- 238000012549 training Methods 0.000 title claims abstract description 80
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000007906 compression Methods 0.000 title claims abstract description 31
- 230000006835 compression Effects 0.000 title claims abstract description 30
- 230000000007 visual effect Effects 0.000 title claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 32
- 230000004913 activation Effects 0.000 claims abstract description 13
- 230000000903 blocking effect Effects 0.000 claims abstract description 7
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000011423 initialization method Methods 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 102100030148 Integrator complex subunit 8 Human genes 0.000 description 2
- 101710092891 Integrator complex subunit 8 Proteins 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 1
- 102100039134 Integrator complex subunit 4 Human genes 0.000 description 1
- 101710092887 Integrator complex subunit 4 Proteins 0.000 description 1
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 1
- 101100317378 Mus musculus Wnt3 gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Algebra (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a visual Transformer compression method and system based on micro-quantization training, and belongs to the technical field of artificial intelligence. The method comprises the following steps: firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping; step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence; and step three, classifying the compressed picture sequence and outputting a predicted probability value. A micro-quantization step training method is introduced when the first step to the third step are executed, and the matching degree of each micro-quantization step and the image data is improved based on the micro-quantization step training method; meanwhile, in the second step, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved. Performance loss caused by quantization is reduced, and quantization precision is improved.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a visual Transformer compression method and system based on micro-quantization training.
Background
In recent years, models based on the structure of the Transformer have achieved very successful results in various natural language processing (natural language processing) tasks. In the field of computer vision (computer vision), some visual transform (vision transform) -based works also achieve effects close to or even surpass those of a traditional convolutional neural network (convolutional neural network) in various visual tasks, including tasks such as classification, detection, segmentation, super-resolution, denoising and the like. However, since the visual Transformer has very large parameters and computation amount increasing with the square level of resolution of input pictures, it will bring high memory occupation and high delay when reasoning, and it is difficult to find deployment on some devices with limited computation power, such as mobile terminal and automatic driving chip. Therefore, it is crucial to explore suitable compression techniques so that the performance penalty is kept low while the visual transform model size and inference delay are greatly reduced.
Quantization has been used extensively in convolutional neural networks as an efficient compression technique. Whether a convolutional neural network or a visual Transformer model, the core operation is matrix multiplication. By quantizing both the weight and the characteristics of the original 32-bit floating point number in the model into a low-bit fixed point number, the low-bit specific point matrix multiplication operation can be used for replacing the original floating point number matrix multiplication operation, so that the reasoning is accelerated while the size of the model is compressed. Quantization is divided into Post-Quantization (Post-Quantization) and Quantization-aware (Quantization-aware) according to whether fine-tuning is performed after Quantization (finetune). For visual transformers, existing work based on post-quantization results in a large performance penalty. However, the traditional quantitative training method does not fully consider the characteristics of the visual Transformer, and the performance under low bit is not ideal.
Disclosure of Invention
The invention provides a visual Transformer compression method and system based on micro-quantization training, aiming at solving the technical problems in the background technology.
The invention adopts the following technical scheme: a visual Transformer compression method based on micro-quantization training comprises the following steps:
firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping;
step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence;
classifying the compressed picture sequence, and outputting a predicted probability value;
a micro-quantization step length training method is introduced when the first step to the third step are executed, and the matching degree of each micro-quantization step length and image data is improved based on the micro-quantization step length training method; meanwhile, in the second step, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved.
In a further embodiment, when performing the scalable step size training method and/or the scalable bias training method, a quantization parameter initialization based on minimizing a mean square error is further included.
In a further embodiment, the method for training the micro-quantization step size is simultaneously suitable for image feature quantization and image weight quantization;
the method for training the micro-quantization step size comprises the following procedures:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
In a further embodiment, the method of quantifiable bias training includes the following steps:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmiN,qmAxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operationWhere β is the introduced differentiable bias.
In a further embodiment, said-qmiN,qmaxValue of (a)The following were used: given the quantization bit b in the set of bits,
for signed number quantization, then there is qmin=2b-1,qmax=2b-1-1;
For unsigned number quantization, then there is qmin=0,qmax=2b-1。
In a further embodiment, the inverse quantization operation uses a pass-through estimator to process the gradient: if a is updated, its gradient is divided by an additional scaling factor g,wherein N iswThe number of elements of weight w representing full precision.
In a further embodiment, the quantization parameter initialization based on the minimum mean square error specifically includes the following procedures:
for a layer with only a microtizable step size α and no offset, the initialization method is expressed as:
And repeatedly solving q and alpha iteratively until alpha converges, taking the value as the initial value of alpha, and then updating alpha by a gradient descent method.
In a further embodiment, for a layer with an additional offset β, the initialization method is expressed as:
β*=E(w-α·q)
e (z) represents the average of all elements of the vector z; and repeatedly and iteratively solving the q, the alpha and the beta until the alpha and the beta converge, taking the solved values as initial values of the alpha and the beta, and then updating the alpha and the beta by using a gradient descent method.
A visual Transformer compression system based on scalable training, comprising:
the quantization processing layer is used for carrying out blocking processing on an input picture and converting the input picture into a corresponding picture sequence through linear mapping;
a self-attention layer configured to perform global information quantization processing on the picture sequence;
a feedforward layer configured to perform global information quantization processing on a picture sequence; the feed-forward layer comprises an active layer; wherein the self-attention layer and the feedforward layer are alternately arranged M times;
a classification processing layer configured to classify the compressed picture sequence and output a predicted probability value;
further comprising: the step length training module capable of being quantized is sequentially embedded into the quantization processing layer, the self-attention layer, the feedforward layer and the classification processing layer; the micro-quantization step training module is set to improve the matching degree of each micro-quantization step and the image data;
a micro-quantifiable bias training module embedded in the active layer; the micro-quantization bias training module is set to automatically learn to obtain an optimal quantization interval and keep the information of the negative activation region.
In a further embodiment, further comprising: and the quantization parameter initialization module is simultaneously connected with the micro-quantization step length training module and the micro-quantization offset training module.
The invention has the beneficial effects that: the invention introduces a micro-quantization step length training method in the compression process, so that the step length of the quantizer is more matched with the distribution of data, thereby greatly reducing the quantization error. Meanwhile, when the local information is quantized, a micro-quantization bias training method is introduced, so that the information of the negative activation region is reserved. And when a micro-quantization step length training method and a micro-quantization offset training method are operated, quantization parameter initialization based on minimum mean square error is used, so that the convergence rate of the model is ensured, and the performance of the model obtained by quantization due to low convergence rate is avoided.
Drawings
Fig. 1 is a diagram of self-attention layer quantization.
Fig. 2 is an activation comparison diagram.
Detailed Description
The invention is further described with reference to the drawings and the specific embodiments in the following description.
Example 1
The embodiment discloses a visual transform compression method based on micro-quantization training, which comprises the following steps:
firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping; in this embodiment, performance close to the full-precision visual transform model can be achieved by using either 8-bit quantization (4 times compression ratio) or 4-bit quantization (8 times compression ratio).
Step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence; wherein M is an integer; through the alternating quantization processing of M times, the performance of the quantized picture is improved, and meanwhile, the fast compression can be guaranteed. The value of M is 12 in this embodiment.
Step three, classifying the compressed picture sequence and outputting a predicted probability value; in the present embodiment, 8-bit quantization (4 times compression rate) may be used.
A quantization step training method is introduced when the first step to the third step are performed, and the matching degree between each quantization step and the image data is improved based on the quantization step training method, in other words, the quantization step training method is used every time quantization processing is performed.
Meanwhile, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved; in other words, the scalable bias training method is applied without performing the active layer once.
In a further embodiment, the method for training the micro-quantization step size is applied to both image feature quantization and image weight quantization, i.e. the quantization strategies for the features and weights of the image are the same. Taking weight quantization as an example, defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
In another embodiment, if the local information quantization process uses the GELU activation function, the performance of the model is improved. The GELU activation function introduces a negative activation value compared to the ReLU activation function. That is, in the present embodiment, as shown in fig. 2, for quantization of the GELU active layer, unsigned number quantization cannot be directly used like ReLU, which may lose information contained in the negative active value.
Therefore, in order to solve the above technical problem, a method for training a quantization of local information by using a scalable offset is introduced, and the method comprises the following steps:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operationWhere β is the introduced quantifiable bias.
When the step-size quantization training method and the offset quantization training method are used for performing inverse quantization operation, because the round operation encounters the problem of gradient disappearance during inverse propagation, the ste (straight through hestiator) is used to process the gradient, i.e., the round operation is ignored during inverse propagation. If a is updated, its gradient is divided by an additional scaling factor g,wherein N iswThe number of elements representing the weight w of full precision.
By adopting the technology, the problem that the model cannot be converged due to too severe alpha change is avoided. Therefore, compared with the traditional fixed step quantization training method, the step quantization training method introduced by the embodiment enables the step size of the quantizer to be more matched with the distribution of data, so that the quantization error is greatly reduced.
In a further embodiment, said-qmin,qmaxThe values of (a) are as follows: given the quantization bit b,
for signed number quantization, then there is qmin=2b-1,qmax=2b-1-1;
For unsigned number quantization, then there is qmin=0,qmax=2b-1。
In another embodiment, although the differentiable quantization step size and offset are learnable parameters, it is still important to select the appropriate quantization parameter initialization. If the initialization method is selected improperly, the convergence speed of the model is slow, and the performance of the model obtained by quantitative training is affected.
Therefore, when the method for training the scalable step size and/or the method for training the scalable offset are/is executed, the method further includes initialization of quantization parameters based on minimizing the mean square error, and specifically includes the following procedures:
for a layer with only a microtizable step size α and no offset, the initialization method is expressed as:
And repeatedly solving q and alpha iteratively until alpha converges, taking the value as the initial value of alpha, and then updating alpha by a gradient descent method.
Similarly, in the inverse quantization operation, the gradient is processed using a straight-through estimator: if a is updated, its gradient is divided by an additional scaling factor g,wherein N iswThe number of elements representing the weight w of full precision.
Based on the method, the test is carried out on DeiT-Tiny and DeiT-Small, the test data set is ImageNet 2012, the accuracy is the test result on the test set (Validation dataset) in the data set, and as shown in the table 1-1, the compression ratios and the classification Top-1 accuracy of different models at different bit positions are displayed. Where FP32 represents a model represented by a 32-bit floating point number, i.e., a full precision model. Int8 and Int4 represent models of 8-bit quantization and 4-bit quantization, respectively. For 8-bit quantization, the fine adjustment is performed for only 1 epoch. Whereas for 4-bit quantization, a fine adjustment of 300 epochs is required. As can be seen from the table, the accuracy loss of the quantization model is within 0.5% for both Int8 and Int 4.
TABLE 1-1 visual Transformer quantitative training experiment results
Example 2
In order to complete the visual Transformer compression method described in embodiment 1, this embodiment discloses a visual Transformer compression system based on quantization training, which includes:
a quantization processing layer configured to perform a blocking process on an input picture and convert the input picture into a corresponding picture sequence through linear mapping; in this embodiment, performance close to the full-precision visual transform model can be achieved by using either 8-bit quantization (4 times compression ratio) or 4-bit quantization (8 times compression ratio).
A self-attention layer configured to perform global information quantization processing on the picture sequence; all operations in the self-attention layer are realized by multiplication with fixed-point matrix by quantizing the weights and features in the self-attention layer into fixed-point numbers. For attention weights (attention score), the present embodiment uses unsigned quantization because its value is constantly greater than 0. Furthermore all weights and features we use signed number (signed) quantization. As shown in fig. 1, the english symbol interpretation in fig. 1 is shown in table 2.
TABLE 2
A feedforward layer configured to perform global information quantization processing on a picture sequence; the feed-forward layer comprises an active layer; wherein the self-attention layer and the feedforward layer are alternately arranged M times;
a classification processing layer configured to classify the compressed picture sequence and output a predicted probability value; in the present embodiment, 8-bit quantization (4 times compression ratio) is not used.
Further comprising: the system comprises a quantization step length training module, a self-attention layer, a feedforward layer and a classification processing layer, wherein the quantization step length training module is embedded into a quantization processing layer, a self-attention layer, a feedforward layer and a classification processing layer in sequence; the micro-quantization step training module is set to improve the matching degree of each micro-quantization step and the image data;
a micro-quantifiable bias training module embedded in the active layer; the micro-quantization bias training module is set to automatically learn to obtain an optimal quantization interval and keep the information of the negative activation region.
Claims (10)
1. A visual Transformer compression method based on micro-quantization training is characterized by comprising the following steps:
firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping;
step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence; wherein M is an integer;
step three, classifying the compressed picture sequence and outputting a predicted probability value; a micro-quantization step length training method is introduced when the first step to the third step are executed, and the matching degree of each micro-quantization step length and image data is improved based on the micro-quantization step length training method; meanwhile, in the second step, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved.
2. The visual fransformer compression method based on scalable quantization training of claim 1, further comprising a quantization parameter initialization based on minimizing a mean square error when performing the scalable step size training method and/or the scalable bias training method.
3. The visual Transformer compression method based on the micro-quantization training as claimed in claim 1, wherein the micro-quantization step training method is applied to image feature quantization and image weight quantization at the same time;
the method for training the micro-quantization step size comprises the following procedures:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; α represents a differentiable quantization step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
4. The visual Transformer compression method based on micro-quantization training as claimed in claim 1, wherein the micro-quantization offset training method comprises the following procedures:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
5. The visual Transformer compression method based on micro-quantization training of any one of claims 3 or 4, wherein-q is the same as qmin,qmaxThe values of (A) are as follows: given the quantization bit b,
for signed number quantization, then there is qmin=2b-1,qmax=2b-1-1;
For unsigned number quantization, then there is qmin=0,qmax=2b-1。
6. The visual Transformer compression method based on micro-quantization training as claimed in any one of claims 3 or 4,
7. The visual transform compression method based on scalable quantization training of claim 2, wherein the quantization parameter initialization based on minimum mean square error specifically comprises the following procedures:
for a layer with only a microtizable step size α and no offset, the initialization method is expressed as:
And repeatedly solving q and alpha iteratively until alpha converges, taking the value as the initial value of alpha, and then updating alpha by a gradient descent method.
8. The visual Transformer compression method based on micro-quantization training as claimed in claim 2, wherein for the layer with bias β additionally, the initialization method is expressed as:
β*=E(w-α·q)
e (z) represents the average of all elements of the vector z; and repeatedly and iteratively solving the q, the alpha and the beta until the alpha and the beta converge, taking the solved values as initial values of the alpha and the beta, and then updating the alpha and the beta by using a gradient descent method.
9. A visual transform compression system based on micro-quantization training, comprising:
the quantization processing layer is used for carrying out blocking processing on an input picture and converting the input picture into a corresponding picture sequence through linear mapping;
a self-attention layer configured to perform global information quantization processing on the picture sequence;
a feedforward layer configured to perform global information quantization processing on a picture sequence; the feed-forward layer comprises an active layer; wherein the self-attention layer and the feedforward layer are alternately arranged M times;
a classification processing layer configured to classify the compressed picture sequence and output a predicted probability value;
further comprising: the system comprises a quantization step length training module, a self-attention layer, a feedforward layer and a classification processing layer, wherein the quantization step length training module is embedded into a quantization processing layer, a self-attention layer, a feedforward layer and a classification processing layer in sequence; the micro-quantization step training module is set to improve the matching degree of each micro-quantization step and the image data;
a micro-quantifiable bias training module embedded in the active layer; the micro-quantization bias training module is set to automatically learn to obtain an optimal quantization interval and keep the information of the negative activation region.
10. The visual transform compression system based on micro-quantization training of claim 9, further comprising: and the quantization parameter initialization module is simultaneously connected with the micro-quantization step length training module and the micro-quantization offset training module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210295189.6A CN114756517A (en) | 2022-03-24 | 2022-03-24 | Visual Transformer compression method and system based on micro-quantization training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210295189.6A CN114756517A (en) | 2022-03-24 | 2022-03-24 | Visual Transformer compression method and system based on micro-quantization training |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114756517A true CN114756517A (en) | 2022-07-15 |
Family
ID=82327804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210295189.6A Pending CN114756517A (en) | 2022-03-24 | 2022-03-24 | Visual Transformer compression method and system based on micro-quantization training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114756517A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152117A (en) * | 2023-04-18 | 2023-05-23 | 煤炭科学研究总院有限公司 | Underground low-light image enhancement method based on Transformer |
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
-
2022
- 2022-03-24 CN CN202210295189.6A patent/CN114756517A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152117A (en) * | 2023-04-18 | 2023-05-23 | 煤炭科学研究总院有限公司 | Underground low-light image enhancement method based on Transformer |
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510067B (en) | Convolutional neural network quantification method based on engineering realization | |
CN111079781B (en) | Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition | |
US20170270408A1 (en) | Method and System for Bit-Depth Reduction in Artificial Neural Networks | |
CN114756517A (en) | Visual Transformer compression method and system based on micro-quantization training | |
CN113011571B (en) | INT8 offline quantization and integer inference method based on Transformer model | |
CN107516129A (en) | The depth Web compression method decomposed based on the adaptive Tucker of dimension | |
CN114402596B (en) | Neural network model decoding method, device, system and medium | |
CN111160524A (en) | Two-stage convolutional neural network model compression method | |
CN110619392A (en) | Deep neural network compression method for embedded mobile equipment | |
CN115238893B (en) | Neural network model quantification method and device for natural language processing | |
CN111310888A (en) | Method for processing convolutional neural network | |
CN113918882A (en) | Data processing acceleration method of dynamic sparse attention mechanism capable of being realized by hardware | |
CN114139683A (en) | Neural network accelerator model quantization method | |
CN113610227A (en) | Efficient deep convolutional neural network pruning method | |
CN118153715A (en) | Large language model fine tuning method based on low-rank matrix decomposition | |
CN113204640B (en) | Text classification method based on attention mechanism | |
Gray et al. | Vector quantization and density estimation | |
CN112989843B (en) | Intention recognition method, device, computing equipment and storage medium | |
CN117725435A (en) | Multi-mode large model adaptation method and storage medium | |
CN112257466A (en) | Model compression method applied to small machine translation equipment | |
CN112418388A (en) | Method and device for realizing deep convolutional neural network processing | |
CN115860062A (en) | Neural network quantization method and device suitable for FPGA | |
CN114065913A (en) | Model quantization method and device and terminal equipment | |
CN115965062A (en) | FPGA (field programmable Gate array) acceleration method for BERT (binary offset Transmission) middle-layer normalized nonlinear function | |
CN111985613B (en) | Normalization method of convolutional neural network circuit based on L1 norm group normalization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |