CN114756517A - Visual Transformer compression method and system based on micro-quantization training - Google Patents

Visual Transformer compression method and system based on micro-quantization training Download PDF

Info

Publication number
CN114756517A
CN114756517A CN202210295189.6A CN202210295189A CN114756517A CN 114756517 A CN114756517 A CN 114756517A CN 202210295189 A CN202210295189 A CN 202210295189A CN 114756517 A CN114756517 A CN 114756517A
Authority
CN
China
Prior art keywords
quantization
micro
training
layer
alpha
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210295189.6A
Other languages
Chinese (zh)
Inventor
李哲鑫
张一帆
王培松
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Original Assignee
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Artificial Intelligence Innovation Research Institute, Institute of Automation of Chinese Academy of Science filed Critical Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Priority to CN202210295189.6A priority Critical patent/CN114756517A/en
Publication of CN114756517A publication Critical patent/CN114756517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a visual Transformer compression method and system based on micro-quantization training, and belongs to the technical field of artificial intelligence. The method comprises the following steps: firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping; step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence; and step three, classifying the compressed picture sequence and outputting a predicted probability value. A micro-quantization step training method is introduced when the first step to the third step are executed, and the matching degree of each micro-quantization step and the image data is improved based on the micro-quantization step training method; meanwhile, in the second step, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved. Performance loss caused by quantization is reduced, and quantization precision is improved.

Description

Visual Transformer compression method and system based on micro-quantization training
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a visual Transformer compression method and system based on micro-quantization training.
Background
In recent years, models based on the structure of the Transformer have achieved very successful results in various natural language processing (natural language processing) tasks. In the field of computer vision (computer vision), some visual transform (vision transform) -based works also achieve effects close to or even surpass those of a traditional convolutional neural network (convolutional neural network) in various visual tasks, including tasks such as classification, detection, segmentation, super-resolution, denoising and the like. However, since the visual Transformer has very large parameters and computation amount increasing with the square level of resolution of input pictures, it will bring high memory occupation and high delay when reasoning, and it is difficult to find deployment on some devices with limited computation power, such as mobile terminal and automatic driving chip. Therefore, it is crucial to explore suitable compression techniques so that the performance penalty is kept low while the visual transform model size and inference delay are greatly reduced.
Quantization has been used extensively in convolutional neural networks as an efficient compression technique. Whether a convolutional neural network or a visual Transformer model, the core operation is matrix multiplication. By quantizing both the weight and the characteristics of the original 32-bit floating point number in the model into a low-bit fixed point number, the low-bit specific point matrix multiplication operation can be used for replacing the original floating point number matrix multiplication operation, so that the reasoning is accelerated while the size of the model is compressed. Quantization is divided into Post-Quantization (Post-Quantization) and Quantization-aware (Quantization-aware) according to whether fine-tuning is performed after Quantization (finetune). For visual transformers, existing work based on post-quantization results in a large performance penalty. However, the traditional quantitative training method does not fully consider the characteristics of the visual Transformer, and the performance under low bit is not ideal.
Disclosure of Invention
The invention provides a visual Transformer compression method and system based on micro-quantization training, aiming at solving the technical problems in the background technology.
The invention adopts the following technical scheme: a visual Transformer compression method based on micro-quantization training comprises the following steps:
firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping;
step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence;
classifying the compressed picture sequence, and outputting a predicted probability value;
a micro-quantization step length training method is introduced when the first step to the third step are executed, and the matching degree of each micro-quantization step length and image data is improved based on the micro-quantization step length training method; meanwhile, in the second step, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved.
In a further embodiment, when performing the scalable step size training method and/or the scalable bias training method, a quantization parameter initialization based on minimizing a mean square error is further included.
In a further embodiment, the method for training the micro-quantization step size is simultaneously suitable for image feature quantization and image weight quantization;
the method for training the micro-quantization step size comprises the following procedures:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
Figure BDA0003563036580000021
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
Figure BDA0003563036580000022
In a further embodiment, the method of quantifiable bias training includes the following steps:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
Figure BDA0003563036580000023
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmiN,qmAxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
Figure BDA0003563036580000024
Where β is the introduced differentiable bias.
In a further embodiment, said-qmiN,qmaxValue of (a)The following were used: given the quantization bit b in the set of bits,
for signed number quantization, then there is qmin=2b-1,qmax=2b-1-1;
For unsigned number quantization, then there is qmin=0,qmax=2b-1。
In a further embodiment, the inverse quantization operation uses a pass-through estimator to process the gradient: if a is updated, its gradient is divided by an additional scaling factor g,
Figure BDA0003563036580000031
wherein N iswThe number of elements of weight w representing full precision.
In a further embodiment, the quantization parameter initialization based on the minimum mean square error specifically includes the following procedures:
for a layer with only a microtizable step size α and no offset, the initialization method is expressed as:
Figure BDA0003563036580000032
assuming that alpha is known, the solution is obtained
Figure BDA0003563036580000033
Assuming that q is known, the solution is obtained
Figure BDA0003563036580000034
And repeatedly solving q and alpha iteratively until alpha converges, taking the value as the initial value of alpha, and then updating alpha by a gradient descent method.
In a further embodiment, for a layer with an additional offset β, the initialization method is expressed as:
Figure BDA0003563036580000035
β*=E(w-α·q)
e (z) represents the average of all elements of the vector z; and repeatedly and iteratively solving the q, the alpha and the beta until the alpha and the beta converge, taking the solved values as initial values of the alpha and the beta, and then updating the alpha and the beta by using a gradient descent method.
A visual Transformer compression system based on scalable training, comprising:
the quantization processing layer is used for carrying out blocking processing on an input picture and converting the input picture into a corresponding picture sequence through linear mapping;
a self-attention layer configured to perform global information quantization processing on the picture sequence;
a feedforward layer configured to perform global information quantization processing on a picture sequence; the feed-forward layer comprises an active layer; wherein the self-attention layer and the feedforward layer are alternately arranged M times;
a classification processing layer configured to classify the compressed picture sequence and output a predicted probability value;
further comprising: the step length training module capable of being quantized is sequentially embedded into the quantization processing layer, the self-attention layer, the feedforward layer and the classification processing layer; the micro-quantization step training module is set to improve the matching degree of each micro-quantization step and the image data;
a micro-quantifiable bias training module embedded in the active layer; the micro-quantization bias training module is set to automatically learn to obtain an optimal quantization interval and keep the information of the negative activation region.
In a further embodiment, further comprising: and the quantization parameter initialization module is simultaneously connected with the micro-quantization step length training module and the micro-quantization offset training module.
The invention has the beneficial effects that: the invention introduces a micro-quantization step length training method in the compression process, so that the step length of the quantizer is more matched with the distribution of data, thereby greatly reducing the quantization error. Meanwhile, when the local information is quantized, a micro-quantization bias training method is introduced, so that the information of the negative activation region is reserved. And when a micro-quantization step length training method and a micro-quantization offset training method are operated, quantization parameter initialization based on minimum mean square error is used, so that the convergence rate of the model is ensured, and the performance of the model obtained by quantization due to low convergence rate is avoided.
Drawings
Fig. 1 is a diagram of self-attention layer quantization.
Fig. 2 is an activation comparison diagram.
Detailed Description
The invention is further described with reference to the drawings and the specific embodiments in the following description.
Example 1
The embodiment discloses a visual transform compression method based on micro-quantization training, which comprises the following steps:
firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping; in this embodiment, performance close to the full-precision visual transform model can be achieved by using either 8-bit quantization (4 times compression ratio) or 4-bit quantization (8 times compression ratio).
Step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence; wherein M is an integer; through the alternating quantization processing of M times, the performance of the quantized picture is improved, and meanwhile, the fast compression can be guaranteed. The value of M is 12 in this embodiment.
Step three, classifying the compressed picture sequence and outputting a predicted probability value; in the present embodiment, 8-bit quantization (4 times compression rate) may be used.
A quantization step training method is introduced when the first step to the third step are performed, and the matching degree between each quantization step and the image data is improved based on the quantization step training method, in other words, the quantization step training method is used every time quantization processing is performed.
Meanwhile, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved; in other words, the scalable bias training method is applied without performing the active layer once.
In a further embodiment, the method for training the micro-quantization step size is applied to both image feature quantization and image weight quantization, i.e. the quantization strategies for the features and weights of the image are the same. Taking weight quantization as an example, defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
Figure BDA0003563036580000051
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
Figure BDA0003563036580000052
In another embodiment, if the local information quantization process uses the GELU activation function, the performance of the model is improved. The GELU activation function introduces a negative activation value compared to the ReLU activation function. That is, in the present embodiment, as shown in fig. 2, for quantization of the GELU active layer, unsigned number quantization cannot be directly used like ReLU, which may lose information contained in the negative active value.
Therefore, in order to solve the above technical problem, a method for training a quantization of local information by using a scalable offset is introduced, and the method comprises the following steps:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
Figure BDA0003563036580000053
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
Figure BDA0003563036580000054
Where β is the introduced quantifiable bias.
When the step-size quantization training method and the offset quantization training method are used for performing inverse quantization operation, because the round operation encounters the problem of gradient disappearance during inverse propagation, the ste (straight through hestiator) is used to process the gradient, i.e., the round operation is ignored during inverse propagation. If a is updated, its gradient is divided by an additional scaling factor g,
Figure BDA0003563036580000061
wherein N iswThe number of elements representing the weight w of full precision.
By adopting the technology, the problem that the model cannot be converged due to too severe alpha change is avoided. Therefore, compared with the traditional fixed step quantization training method, the step quantization training method introduced by the embodiment enables the step size of the quantizer to be more matched with the distribution of data, so that the quantization error is greatly reduced.
In a further embodiment, said-qmin,qmaxThe values of (a) are as follows: given the quantization bit b,
for signed number quantization, then there is qmin=2b-1,qmax=2b-1-1;
For unsigned number quantization, then there is qmin=0,qmax=2b-1。
In another embodiment, although the differentiable quantization step size and offset are learnable parameters, it is still important to select the appropriate quantization parameter initialization. If the initialization method is selected improperly, the convergence speed of the model is slow, and the performance of the model obtained by quantitative training is affected.
Therefore, when the method for training the scalable step size and/or the method for training the scalable offset are/is executed, the method further includes initialization of quantization parameters based on minimizing the mean square error, and specifically includes the following procedures:
for a layer with only a microtizable step size α and no offset, the initialization method is expressed as:
Figure BDA0003563036580000062
assuming that alpha is known, the solution is obtained
Figure BDA0003563036580000063
Assuming that q is known, the solution is obtained
Figure BDA0003563036580000064
And repeatedly solving q and alpha iteratively until alpha converges, taking the value as the initial value of alpha, and then updating alpha by a gradient descent method.
Similarly, in the inverse quantization operation, the gradient is processed using a straight-through estimator: if a is updated, its gradient is divided by an additional scaling factor g,
Figure BDA0003563036580000065
wherein N iswThe number of elements representing the weight w of full precision.
Based on the method, the test is carried out on DeiT-Tiny and DeiT-Small, the test data set is ImageNet 2012, the accuracy is the test result on the test set (Validation dataset) in the data set, and as shown in the table 1-1, the compression ratios and the classification Top-1 accuracy of different models at different bit positions are displayed. Where FP32 represents a model represented by a 32-bit floating point number, i.e., a full precision model. Int8 and Int4 represent models of 8-bit quantization and 4-bit quantization, respectively. For 8-bit quantization, the fine adjustment is performed for only 1 epoch. Whereas for 4-bit quantization, a fine adjustment of 300 epochs is required. As can be seen from the table, the accuracy loss of the quantization model is within 0.5% for both Int8 and Int 4.
Figure BDA0003563036580000071
TABLE 1-1 visual Transformer quantitative training experiment results
Example 2
In order to complete the visual Transformer compression method described in embodiment 1, this embodiment discloses a visual Transformer compression system based on quantization training, which includes:
a quantization processing layer configured to perform a blocking process on an input picture and convert the input picture into a corresponding picture sequence through linear mapping; in this embodiment, performance close to the full-precision visual transform model can be achieved by using either 8-bit quantization (4 times compression ratio) or 4-bit quantization (8 times compression ratio).
A self-attention layer configured to perform global information quantization processing on the picture sequence; all operations in the self-attention layer are realized by multiplication with fixed-point matrix by quantizing the weights and features in the self-attention layer into fixed-point numbers. For attention weights (attention score), the present embodiment uses unsigned quantization because its value is constantly greater than 0. Furthermore all weights and features we use signed number (signed) quantization. As shown in fig. 1, the english symbol interpretation in fig. 1 is shown in table 2.
TABLE 2
Figure BDA0003563036580000072
A feedforward layer configured to perform global information quantization processing on a picture sequence; the feed-forward layer comprises an active layer; wherein the self-attention layer and the feedforward layer are alternately arranged M times;
a classification processing layer configured to classify the compressed picture sequence and output a predicted probability value; in the present embodiment, 8-bit quantization (4 times compression ratio) is not used.
Further comprising: the system comprises a quantization step length training module, a self-attention layer, a feedforward layer and a classification processing layer, wherein the quantization step length training module is embedded into a quantization processing layer, a self-attention layer, a feedforward layer and a classification processing layer in sequence; the micro-quantization step training module is set to improve the matching degree of each micro-quantization step and the image data;
a micro-quantifiable bias training module embedded in the active layer; the micro-quantization bias training module is set to automatically learn to obtain an optimal quantization interval and keep the information of the negative activation region.

Claims (10)

1. A visual Transformer compression method based on micro-quantization training is characterized by comprising the following steps:
firstly, carrying out blocking processing on an input picture, and converting the input picture into a corresponding picture sequence through linear mapping;
step two, the picture sequence is subjected to quantization alternate processing of global information and local information for M times in sequence to obtain a compressed picture sequence; wherein M is an integer;
step three, classifying the compressed picture sequence and outputting a predicted probability value; a micro-quantization step length training method is introduced when the first step to the third step are executed, and the matching degree of each micro-quantization step length and image data is improved based on the micro-quantization step length training method; meanwhile, in the second step, a micro-quantization bias training method is introduced when local information quantization is executed, an optimal quantization interval is obtained based on automatic learning of the micro-quantization bias training method, and information of a negative activation region is reserved.
2. The visual fransformer compression method based on scalable quantization training of claim 1, further comprising a quantization parameter initialization based on minimizing a mean square error when performing the scalable step size training method and/or the scalable bias training method.
3. The visual Transformer compression method based on the micro-quantization training as claimed in claim 1, wherein the micro-quantization step training method is applied to image feature quantization and image weight quantization at the same time;
the method for training the micro-quantization step size comprises the following procedures:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
Figure FDA0003563036570000011
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; α represents a differentiable quantization step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
Figure FDA0003563036570000012
Figure FDA0003563036570000013
4. The visual Transformer compression method based on micro-quantization training as claimed in claim 1, wherein the micro-quantization offset training method comprises the following procedures:
defining the weight of full precision as w, the quantized fixed point weight as q, and the quantization operation is expressed as:
Figure FDA0003563036570000014
in the formula, clip (z, a, b) represents that an element larger than a in the matrix z is set as a, and an element larger than b is set as b; round operation represents rounding based on rounding; alpha denotes a differentiable step size, -qmin,qmaxRespectively representing the minimum value and the maximum value of the quantization range;
calculating to obtain a floating point corresponding to the fixed point weight q through inverse quantization operation
Figure FDA0003563036570000021
Figure FDA0003563036570000022
Where β is the introduced differentiable bias.
5. The visual Transformer compression method based on micro-quantization training of any one of claims 3 or 4, wherein-q is the same as qmin,qmaxThe values of (A) are as follows: given the quantization bit b,
for signed number quantization, then there is qmin=2b-1,qmax=2b-1-1;
For unsigned number quantization, then there is qmin=0,qmax=2b-1。
6. The visual Transformer compression method based on micro-quantization training as claimed in any one of claims 3 or 4,
in inverse quantization operation, the gradient is processed using a pass-through estimator: if a is updated, its gradient is divided by an additional scaling factor g,
Figure FDA0003563036570000023
wherein N iswThe number of elements representing the weight w of full precision.
7. The visual transform compression method based on scalable quantization training of claim 2, wherein the quantization parameter initialization based on minimum mean square error specifically comprises the following procedures:
for a layer with only a microtizable step size α and no offset, the initialization method is expressed as:
Figure FDA0003563036570000024
assuming that alpha is known, the solution is obtained
Figure FDA0003563036570000025
Assuming that q is known, the solution is obtained
Figure FDA0003563036570000026
And repeatedly solving q and alpha iteratively until alpha converges, taking the value as the initial value of alpha, and then updating alpha by a gradient descent method.
8. The visual Transformer compression method based on micro-quantization training as claimed in claim 2, wherein for the layer with bias β additionally, the initialization method is expressed as:
Figure FDA0003563036570000027
β*=E(w-α·q)
e (z) represents the average of all elements of the vector z; and repeatedly and iteratively solving the q, the alpha and the beta until the alpha and the beta converge, taking the solved values as initial values of the alpha and the beta, and then updating the alpha and the beta by using a gradient descent method.
9. A visual transform compression system based on micro-quantization training, comprising:
the quantization processing layer is used for carrying out blocking processing on an input picture and converting the input picture into a corresponding picture sequence through linear mapping;
a self-attention layer configured to perform global information quantization processing on the picture sequence;
a feedforward layer configured to perform global information quantization processing on a picture sequence; the feed-forward layer comprises an active layer; wherein the self-attention layer and the feedforward layer are alternately arranged M times;
a classification processing layer configured to classify the compressed picture sequence and output a predicted probability value;
further comprising: the system comprises a quantization step length training module, a self-attention layer, a feedforward layer and a classification processing layer, wherein the quantization step length training module is embedded into a quantization processing layer, a self-attention layer, a feedforward layer and a classification processing layer in sequence; the micro-quantization step training module is set to improve the matching degree of each micro-quantization step and the image data;
a micro-quantifiable bias training module embedded in the active layer; the micro-quantization bias training module is set to automatically learn to obtain an optimal quantization interval and keep the information of the negative activation region.
10. The visual transform compression system based on micro-quantization training of claim 9, further comprising: and the quantization parameter initialization module is simultaneously connected with the micro-quantization step length training module and the micro-quantization offset training module.
CN202210295189.6A 2022-03-24 2022-03-24 Visual Transformer compression method and system based on micro-quantization training Pending CN114756517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210295189.6A CN114756517A (en) 2022-03-24 2022-03-24 Visual Transformer compression method and system based on micro-quantization training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210295189.6A CN114756517A (en) 2022-03-24 2022-03-24 Visual Transformer compression method and system based on micro-quantization training

Publications (1)

Publication Number Publication Date
CN114756517A true CN114756517A (en) 2022-07-15

Family

ID=82327804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210295189.6A Pending CN114756517A (en) 2022-03-24 2022-03-24 Visual Transformer compression method and system based on micro-quantization training

Country Status (1)

Country Link
CN (1) CN114756517A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152117A (en) * 2023-04-18 2023-05-23 煤炭科学研究总院有限公司 Underground low-light image enhancement method based on Transformer
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152117A (en) * 2023-04-18 2023-05-23 煤炭科学研究总院有限公司 Underground low-light image enhancement method based on Transformer
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Similar Documents

Publication Publication Date Title
CN108510067B (en) Convolutional neural network quantification method based on engineering realization
CN111079781B (en) Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition
US20170270408A1 (en) Method and System for Bit-Depth Reduction in Artificial Neural Networks
CN114756517A (en) Visual Transformer compression method and system based on micro-quantization training
CN113011571B (en) INT8 offline quantization and integer inference method based on Transformer model
CN107516129A (en) The depth Web compression method decomposed based on the adaptive Tucker of dimension
CN114402596B (en) Neural network model decoding method, device, system and medium
CN111160524A (en) Two-stage convolutional neural network model compression method
CN110619392A (en) Deep neural network compression method for embedded mobile equipment
CN115238893B (en) Neural network model quantification method and device for natural language processing
CN111310888A (en) Method for processing convolutional neural network
CN113918882A (en) Data processing acceleration method of dynamic sparse attention mechanism capable of being realized by hardware
CN114139683A (en) Neural network accelerator model quantization method
CN113610227A (en) Efficient deep convolutional neural network pruning method
CN118153715A (en) Large language model fine tuning method based on low-rank matrix decomposition
CN113204640B (en) Text classification method based on attention mechanism
Gray et al. Vector quantization and density estimation
CN112989843B (en) Intention recognition method, device, computing equipment and storage medium
CN117725435A (en) Multi-mode large model adaptation method and storage medium
CN112257466A (en) Model compression method applied to small machine translation equipment
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
CN115860062A (en) Neural network quantization method and device suitable for FPGA
CN114065913A (en) Model quantization method and device and terminal equipment
CN115965062A (en) FPGA (field programmable Gate array) acceleration method for BERT (binary offset Transmission) middle-layer normalized nonlinear function
CN111985613B (en) Normalization method of convolutional neural network circuit based on L1 norm group normalization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination