CN117689044A - Quantification method suitable for vision self-attention model - Google Patents

Quantification method suitable for vision self-attention model Download PDF

Info

Publication number
CN117689044A
CN117689044A CN202410142459.9A CN202410142459A CN117689044A CN 117689044 A CN117689044 A CN 117689044A CN 202410142459 A CN202410142459 A CN 202410142459A CN 117689044 A CN117689044 A CN 117689044A
Authority
CN
China
Prior art keywords
quantization
quantizer
layer
log2
uniform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410142459.9A
Other languages
Chinese (zh)
Inventor
纪荣嵘
胡佳伟
钟云山
林明宝
陈锰钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202410142459.9A priority Critical patent/CN117689044A/en
Publication of CN117689044A publication Critical patent/CN117689044A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a quantization method suitable for a visual self-attention model (Vits), relates to compression and acceleration of an artificial neural network, and provides a displacement uniform log2 quantizer, which introduces initial displacement bias on log2 function input and then uniformly quantizes output; a three-stage smoothing optimization strategy is also provided, which fully utilizes a smooth low-amplitude loss diagram for optimization, and simultaneously maintains the high efficiency of activating layer-by-layer quantization. The method of the invention has simple idea, saves calculation cost, greatly improves performance under extremely low compression bit, and can obtain a quantization model in a direct later training mode and obtain better performance only by applying the quantizer designed by the invention.

Description

Quantification method suitable for vision self-attention model
Technical Field
The invention relates to compression and acceleration of an artificial neural network, in particular to a quantization method suitable for a visual self-attention model (Vits).
Background
In the evolving field of computer vision, the recently emerging vision transducer stands out as an excellent architecture for capturing long distance relationships between image tiles, with its multi-headed self-attention Mechanism (MHSA). However, as the number of segmented image lots n increases, the MHSA operation generates O (n 2 ) With an unacceptable computational overhead. In order to realize better application of the Vits series model in the practical process, a model compression method of the Vits series model is designed and proposed.
To accommodate unique structures in visual recognition models, such as LayerNorm and self-care mechanisms, current work on network quantization training (PTQ) of the ViTs typically introduces specialized quantizers and quantization schemes to preserve the original performance of the ViTs. For example, FQ-ViT and PTQ4ViT introduce a log2 quantizer and a bi-uniform quantizer for post-Softmax activation, respectively, while RepQ-ViT employs a channel-level quantizer that is first applied to LayerNorm post-activation values with a large variance distribution, and then re-parameterized as a hierarchical quantizer. In the case of 4 bits, the above RepQ-ViT resulted in a 10.82% drop in accuracy over the ImageNet for full precision DeiT-S; while in the case of 3 bits this drop is more pronounced, reaching 74.48%. Recently, optimization-based PTQ methods have shown their potential in quantifying Convolutional Neural Networks (CNNs). However, their attempts at Vision Transformers have remained underutilized, and in fig. 4 we have found that they tend to result in overfitting at high bits and suffer from significant performance degradation at ultra low bits, limiting their use in the ViTs architecture.
In view of this, the present application proposes a quantization method of a visual self-attention model that can keep very low bits while having high performance.
Disclosure of Invention
The invention aims to solve the technical problem of providing a quantization method suitable for a visual self-attention model (Vits), and aiming at the current Vits, when the current Vits perform post-training quantization, the quantizer designed by the invention is applied, so that the quantization model can be obtained in a direct post-training mode, and the performance is higher while the very low bit is kept.
The invention provides a quantization method suitable for a visual self-attention model, which comprises the following steps:
in the initial stage, fine adjustment is carried out on the model, meanwhile, full-precision weight is used, a channel-level quantizer is used for an activation value after LayerNorm, a shift uniform log2 quantizer is used for an activation value after softmax, and a layer-by-layer quantizer is used for other activations;
in the second stage, the quantizer of the channel level is smoothly transited to the corresponding hierarchical form by utilizing the scale re-parameterization technology, so that the activation value after LayerNorm is changed from the quantizer using the channel level to the quantizer adopting the layer-by-layer mode;
in the third stage, the model is trimmed using the loss function, while the activations and weights are quantized, wherein the activation values after softmax use a shifted uniform log2 quantizer, and the other activations use a layer-by-layer quantizer.
Further, the shift uniform log2 quantizer is: an initial shift bias is introduced on the log2 function input, and then the output is uniformly quantized, specifically designed as follows:
a shift bias is introduced before providing the full precision activation value input to the log2 transformationThen processing using a uniform quantizer, wherein the quantization process formula is:
the inverse quantization process formula is:
wherein,is an activation value input,/>Is the result after quantization, < >>Is the quantized integer value, +.>The quantization and inverse quantization calculation processes respectively representing uniform quantization are as follows:
wherein b represents bit, s represents quantization scale, and z represents zero;
further, the quantization of the channel level is smoothly transited to the corresponding hierarchical form by using the scale re-parameterization technology, and the parameters are calculated by adopting the following formula:
wherein the method comprises the steps of,/>Is a parameter of the original LayerNorm layer, < >>,/>Is La after scale heavy parameterParameters of the yerNorm layer, +.>,/>Is a scale heavy parameter calculation parameter, < >> Is the original weight parameter,/->,/>Is the post-scale weight parameter.
Further, the loss function is:
wherein,representing full precision visual self-attention model NolOutput of the individual module->Representing quantized visual self-attention model numberlThe outputs of the modules.
The invention has the following technical effects or advantages:
1. the present invention proposes a shift-uniform-log 2 quantizer (SULQ) that achieves an accurate approximation of the full coverage and distribution of the input domain by introducing an offset before log2 transformation and then uniformly quantizing its output;
2. the invention provides a three-stage Smooth Optimization Strategy (SOS), which fully utilizes a smooth low-amplitude loss diagram to optimize, and simultaneously maintains the high efficiency of activating layer-by-layer quantification;
3. the method is simple and easy to realize, simultaneously saves calculation cost, greatly improves performance, and has the performance exceeding that of various mainstream post-training quantization methods, and particularly has more obvious phenomenon when the bit is lower.
Drawings
The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.
FIG. 1 is a flowchart illustrating a quantization method for a visual self-attention model according to a first embodiment of the present invention;
FIG. 2 is a schematic block diagram of the principle logic of a conventional quantizer;
FIG. 3 is a schematic block diagram of the principle logic of the shift uniform log2 quantizer of the present invention;
FIG. 4 is a diagram showing a comparison of the effect of the application with other methods according to an embodiment of the present invention;
FIG. 5 is a second view showing the effect of the application of an embodiment of the present invention compared with other methods.
Detailed Description
The invention provides a quantization method suitable for a visual self-attention model, which adopts the following technical scheme: the invention provides a shift uniform log2 quantizer (SULQ), which introduces initial shift bias on log2 function input and then uniformly quantizes output; meanwhile, a three-stage Smooth Optimization Strategy (SOS) is provided, and a smooth low-amplitude loss diagram is fully utilized for optimization, and meanwhile, the efficiency of activating layer-by-layer quantification is maintained.
In order to better understand the technical scheme of the present invention, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1, an embodiment of the present invention provides a quantization method applicable to a visual self-attention model, the method including:
in the initial stage, fine adjustment is carried out on the model, meanwhile, full-precision weight is used, a channel-level quantizer is used for an activation value after LayerNorm, a shift uniform log2 quantizer is used for an activation value after softmax, and a layer-by-layer quantizer is used for other activations;
in the second stage, the quantizer of the channel level is smoothly transited to the corresponding hierarchical form by utilizing the scale re-parameterization technology, so that the activation value after LayerNorm is changed from the quantizer using the channel level to the quantizer adopting the layer-by-layer mode;
in the third stage, the model is trimmed using the loss function, while the activations and weights are quantized, wherein the activation values after softmax use a shifted uniform log2 quantizer, and the other activations use a layer-by-layer quantizer.
Preferably, the shift uniform log2 quantizer is: an initial shift bias is introduced on the log2 function input, and then the output is uniformly quantized, specifically designed as follows:
a shift bias is introduced before providing the full precision activation value input to the log2 transformationThen processing using a uniform quantizer, wherein the quantization process formula is:
the inverse quantization process formula is:
wherein,is an activation value input, ++>Is the result after quantization, < >>Is the quantized integer value, +.>The quantization and inverse quantization calculation processes respectively representing uniform quantization are as follows:
wherein b represents bit, s represents quantization scale, and z represents zero;
preferably, the quantization of the channel level is smoothly transited to the hierarchical form corresponding to the quantization by using the scale re-parameterization technique, and the parameters are calculated by adopting the following formula:
wherein the method comprises the steps of,/>Is a parameter of the original LayerNorm layer, < >>,/>Is a parameter of LayerNorm layer after scale heavy parameter, +.>,/>Is a scale heavy parameter calculation parameter, < >> Is the original weight parameter,/->,/>Is the post-scale weight parameter.
Preferably, the loss function is:
wherein,representing full precision visual self-attention model NolOutput of the individual module->Representing quantized visual self-attention model numberlThe outputs of the modules.
Some symbols in this application are explained below:
1. general structure of the ViTs:
assuming that a picture I passes through an embedding layer and is segmented into N blocks with 2D dimensions, the picture is used as a vector at the momentThe representation is:
vector of thenIs fed into LBlocks ViT each comprising a multi-head self-attention operation (MHSA) and multi-layer perceptron (MLP) architecture, for the +.>Layer blocks, the calculation process can be expressed as:
Z l-1 representing the output of the MHSA,X l representing the output of MLP, MHSA consists of H self-attention heads, for the firstA self-attention head for inputting +.>The calculation process of (2) is as follows:
wherein,A h sum [Q hK hV h ]All are the results of the intermediate calculation,represents the dimension of the self-attention head, +.>Representing linear layer weights in QKV, < >>Represents QKV linear layer bias, let ∈ ->
The output of the MHSA may be expressed as:
the MLP consists of two fully connected layers and an activation function (Gelu), and the calculation of this procedure is given by the fact that the activation value is fed into the MLP:
wherein,b 0b 1b 2W 0W 1W 2 are weight values.
2. Analysis of existing log2 quantizer
When using a uniform quantizer and a log2 quantizer, FIG. 2 plots full-precision activation valuesAnd quantized +.>Relationship between them. The log2 quantizer favors more bits over the region near zero than the uniform quantizer, exhibiting advantages in coping with the long tail distribution problem prevalent in post Softmax activation. However, log2 quantizers also have a major quantization efficiency problem. Consider that Softmax activated X after input ranges from [1.08e-8, 0.868]The rounding resulted in a span of maximally 26 and maximally 0. 3-bit quantization coverage is [0, 7]Thus, rounded sections [8, 26]Will be clamped to 7. For 4-bit quantization, rounded sections [16, 26]Will be clamped to 15. At this point, there is a "quantization efficiency" problem because most values are clamped to remote locations. Activation after Softmax has a number of values near zero. Quantization efficiency problems result in a large amount of quantization errors, thereby affecting the performance of the model.
By comparing quantization positions after the use of a 3bit uniform quantizer, a log2 quantizer and the effect of the shifted uniform log2 quantizer SULQ of the present invention. Compared with the existing log2 quantizer, the log2 quantizer with uniform displacement can realize different quantization effects according to the offset value, so that different input distributions can be quantized better by adjusting the offset value, and more flexible quantization effects can be realized than the existing log2 quantizer.
3. Quantitative description
The quantization method of the invention comprises the following steps:
1) A shift uniform log2 quantizer (SULQ) that introduces an initial shift bias on the log2 function input and then uniformly quantizes the output;
2) A three-phase Smoothing Optimization Strategy (SOS) that leverages smooth, low-amplitude loss maps for optimization while maintaining the efficiency of active layer-by-layer quantization.
The existing log2 quantizer has the problem of low quantization efficiency when Softmax is activated after processing, namely the quantization range cannot cover the whole input domain. To solve this problem, a shift-uniform-log 2 quantizer (SULQ) is proposed herein, which achieves an accurate approximation of the full coverage and distribution of the input field by introducing an offset before log2 transformation and then uniformly quantizing its output, as shown in fig. 3.
Specifically, in 1), the shift uniform log2 quantizer (SULQ), at the input of the full precisionBefore providing the log2 transformation, a shift bias is introduced>Then processing is performed using a uniform quantizer, quantization process:
inverse quantization process:
wherein,the distribution represents a uniformly quantized quantization and dequantization calculation process, which can be calculated as follows:
in 2), the three-stage Smoothing Optimization Strategy (SOS) is specifically:
in the initial stage, the model is finely adjusted, and full-precision weight and LayerNorm based on a channel are used for quantization after activation, and other activation adopts a layer-by-layer quantizer; in the second stage, smoothly transitioning the quantizer of the channel level to a hierarchical form corresponding to the quantizer by skillfully utilizing a scale re-parameterization technique;
in the third stage, the model is finely tuned, and meanwhile, the activation and the weight are quantized, so that performance degradation caused by weight quantization is compensated, and performance of the quantized model is improved.
Through a large number of experimental verification, the quantization method suitable for the visual self-attention model (Vits) provided by the invention is simple in thought, simultaneously saves calculation cost, greatly improves performance, and simultaneously has more obvious phenomenon when the bit is lower than that of the post-training quantization method of various main streams. ViT-B in the 3-bit case, the method herein improves by 50.68% over the existing PTQ method.
4. Implementation details
All algorithms are based on the PyTorch framework (Adam Paszke, sam Gross, francisco Massa, adam Lerer, james Bradbury, gregory Chanan, trevor Killen, zeming Lin, natalia Gimelshein, luca anti, et al Pytorch: an imperative style, high-performance deep learning library In Proceedings of the Advances in Neural Information Process-ing Systems (Neurops), pages 8026-8037, 2019.6) using uniform quantization for all weights and activation values, except for activation values after softmax using shift uniform log2 quantization. A pass-through estimator (STE) is used to perform gradient estimation for round operations in the quantization process. All experimental configurations used 1024 calibration pictures, derived from ImageNet and Coco datasets.
The quantization model is a model comprising a vit series, a det series, a swin series. SOTA competitors in PTQ quantized to 6bit, 4bit, 3bit, respectively, have FQ-ViT (Yang Lin, tianyu Zhang, peiqin Sun, zheng Li, and Shuang Zhou. Fq-vit: post-training quantization for fully quantized vision transducer In Proceedings of the Thirty First International Joint Conference on Artificial Intelligence, (IJCAI), pages 1173-1179, 2022.1, 2, 3, 4, 5,), PSAQ-ViT (Zhikai Li, lip Ma, mengjuan Chen, junixiao, and Qingyi Gu. Patch similarity aware data-free quantization for vision transducers In Proceedings of the European Conference on Computer Vision (ECCV), pages 154-170. Springer, 2022.), ranking-ViT (Zhenhua Liu, yonhe Wang, kai Han, wei Zhang, sinwell Ma, and Wen Gao, post-training quantization for vision transducer In Proceedings of the Advances in Neural Information Processing Systems (NeuroPS), pages 28092-28103, 2021, eamant [ Disyu, qingye Zhang, ming Zhang, yes, and Deqqu Zhang). Post-training quantization via scale optimization, coRR, abs/2006.16669, 2020 ], PTQ4ViT [ Zhihang Yuan, chenhao Xue, YIqi Chen, qiang Wu, and Guangyu Sun, ptq4vit: post-training quantization for vision transformers with twin uniform quantization, in Proceedings of the European Conference on Computer Vision (ECCV), pages 191-207, springer, 2022 ], APQ-ViT (YIpu Ding, haotong Qin, qinghua Yan, zhhenhua Chai, junjie Liu, xiaolin Wei, and Xiangalon Liu. Towards accurate Post-training quantization for vision transducer In Proceedings of the 30th ACM International Conference on Multimedia (ACMMM), pages 5380-5388, 2022.) and NoisyQuant (YIjiang Liu, huanrui Yang, zhen Dong, kurt Keutzer, li Du, and Shangmanng Zhang. Noisyquant; noisy bias enhanced Post-training activation quantization for vision transformers In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20321-20330, 2023.), BRECQ (Yuhang Li, ruihao Gong, xu Tan, yang Yang Yang, peng Hu, qi Zhang, fengwei Yu, wei Wang, and Shi Gu. Breq: pushing the limit of Post-training quantization by block reconconstruction In Proceedings of the International Conference on Learning Representations (ICLR), 2021), qdrop (Xiuying Wei, ruihao Gong, yuhang Li, xiangaging Liu, and Fengwei Yu. Qdrop: randomly dropping quantization for extremely low-Bit Post-transformation In Proceedings of the International Conference on Learning Representations (ICLR), 2022), PD-Quant (Jiawei Liu, lin Niu, zhihang Yuan, dawei Yang, xinggang Wang, and Wenyu Liu, pd-Quant: post-training quantization based on prediction difference metric In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24427-24437, 2023), bit-shaking (Chen Lin, bo Peng, zyang Li, wenming Tan, ye Ren, jun Xiao, and Shiliang Pu. Bit-shaping: limiting instantaneous sharpness for improving post-shaping quation, in Pro-ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16196-16205, 2023.), repQ-ViT (Zhikai Li, jun rui Xiao, lianwei Yang, and Qingyi Gu. Repq-vit: scale reparameterization for post-training quantization of vision transformers In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17227-17236, 2023.).
5. Application effects
The method can be applied to quantification of the visual self-attention model (Vits), and the processing efficiency and the processing precision of the model to the picture are improved through compression and acceleration of the visual self-attention model (Vits).
The tables of fig. 4 and 5 show the quantitative results of the method of the invention and the existing quantization method, respectively, on different data sets. It can be seen that the method of the invention is in most cases superior to all comparison methods on these quantization models with different bit widths.
FIG. 4 is the quantitative results of ImageNet. It can be seen that the method of the present invention achieves the best performance at different bit settings. In particular, the method of the invention has advantages in all bits, especially in the case of low numbers. As shown in fig. 4, there is a very significant performance degradation in the case of ultra low bits, whether there is no optimization or an optimization-based approach. For example, in the 3-bit case, optimization-based PTQ4ViT suffers from crashes in all Vits models, while RepQ-ViT has limited accuracy. For example, repQ-ViT is only 0.97%, 4.37% and 4.84% for DeiT-T, deiT-B and DeiT-B, respectively. The optimization-based approach provides better results but exhibits unstable performance for different ViTs models. For example, BRECQ has a crash problem on ViT-S and Swin-B. In contrast, the proposed method exhibits stable and significantly improved performance over the ViT variant. In particular, the method of the present invention achieved encouraging 40.72% and 50.68% improvement over the previous methods, respectively, in ViT-S and ViT-B quantification.
For DeiT-T, deiT-B and DeiT-B, the process of the present invention achieved 41.52%, 55.78% and 73.30% performance, respectively, corresponding to 1.55%, 26.45% and 27.01% increase, respectively. On Swin-S and Swin-B, the inventive method reported increases of 4.53% and 4.98%, respectively. In the case of 4-bit, non-optimized RepQ-ViT outperforms the optimization-based approach over most ViT variants, indicating that the previous optimization-based PTQ approach has a overfitting problem. The proposed method is significantly improved over RepQ-ViT in all ViT variants. In particular, the process of the present invention achieves significant improvements of 9.82% and 11.59% over ViT-S and ViT-B, respectively. The method of the present invention provides significant 3.28%, 4.6% and 4.36% accuracy gains when quantifying DeiT-T, deiT-S and DeiT-B, respectively. As regards Swin-S and Swin-B, the process according to the invention exhibits a performance gain of 1.72% and 1.48%, respectively. RepQ-ViT outperforms the optimization-based approach over most ViT variants with 6 bits, indicating that the optimization-based approach also suffers from the same over-fitting problem with 4 bits. Similar to the results for 3 bits and 4 bits, the inventive method shows improved and satisfactory results in performance. For example, in DeiT-B, swin-S and Swin-B quantification, the methods of the present invention have an accuracy of 81.68%, 82.89% and 84.94%, respectively, with only 0.12%, 0.34% and 0.33% loss of accuracy relative to the full accuracy model.
Quantitative results for Coco dataset see fig. 5, results for target detection and example segmentation are reported in fig. 5, all networks are quantized to 4 bits. It can be seen that in most cases the process of the invention achieves better performance.
Specifically, when Mask R-CNN uses Swin-T as its backbone, the method of the present invention increases the frame AP and Mask AP by 1.4 and 0.6 points, respectively. Also, in cascading Mask R-CNN, when Swin-T is used as backbone, the method of the invention increases frame AP by 1.2 and Mask AP by 0.6. When Swin-S is used as backbone, box AP is raised by 1.0 and mask AP is raised by 0.5.
The technical scheme provided in the embodiment of the application has at least the following technical effects or advantages: the present invention proposes a shift-uniform-log 2 quantizer (SULQ) that achieves an accurate approximation of the full coverage and distribution of the input domain by introducing an offset before log2 transformation and then uniformly quantizing its output; the invention also provides a three-stage Smooth Optimization Strategy (SOS), which fully utilizes a smooth low-amplitude loss diagram to optimize, and simultaneously maintains the high efficiency of activating layer-by-layer quantization. The method of the invention improves the processing efficiency and accuracy of the model to the picture by compressing and accelerating the visual self-attention model (Vits), is simple and easy to realize, saves the calculation cost, and greatly improves the performance, and the performance of the method exceeds the post-training quantization method of various main streams, particularly when the bit is lower, the phenomenon is more obvious, for example, viT-B in the case of 3 bits, and the method is improved by 50.68 percent compared with the existing PTQ method.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims (4)

1. A quantization method suitable for a visual self-attention model, characterized by: the method comprises the following steps:
in the initial stage, fine adjustment is carried out on the model, meanwhile, full-precision weight is used, a channel-level quantizer is used for an activation value after LayerNorm, a shift uniform log2 quantizer is used for an activation value after softmax, and a layer-by-layer quantizer is used for other activations;
in the second stage, the quantizer of the channel level is smoothly transited to the corresponding hierarchical form by utilizing the scale re-parameterization technology, so that the activation value after LayerNorm is changed from the quantizer using the channel level to the quantizer adopting the layer-by-layer mode;
in the third stage, the model is trimmed using the loss function, while the activations and weights are quantized, wherein the activation values after softmax use a shifted uniform log2 quantizer, and the other activations use a layer-by-layer quantizer.
2. A method of quantifying a visual self-attention model according to claim 1, characterized by: the shift uniform log2 quantizer is: an initial shift bias is introduced on the log2 function input, and then the output is uniformly quantized, specifically designed as follows:
a shift bias is introduced before providing the full precision activation value input to the log2 transformationThen processing using a uniform quantizer, wherein the quantization process formula is:
the inverse quantization process formula is:
wherein (1)>Is an activation value input, ++>Is the result after quantization, < >>Is the quantized integer value, +.>The quantization and inverse quantization calculation processes respectively representing uniform quantization are as follows:
;/>wherein b represents bit, s represents quantization scale, z represents zero point, and clip represents given interval [0, 2 ] b -1]Cut-off function on->The representation will->Is subjected to a rounding operation.
3. A method of quantifying a visual self-attention model according to claim 1, characterized by: the quantization device adopts the following formula to calculate parameters when the quantization device of the channel level is smoothly transited to the hierarchical form corresponding to the quantization device by utilizing the scale re-parameterization technology:
;/>wherein->,/>Is a parameter of the original LayerNorm layer, < >>,/>Is a parameter of LayerNorm layer after scale heavy parameter, +.>,/>,/>Is a scale heavy parameter calculation parameter, < >> Is the original weight parameter,/->,/>Is the post-scale weight parameter.
4. A method of quantifying a visual self-attention model according to claim 1, characterized by: the loss function is:
wherein (1)>Representing full precision visual self-attention model NolOutput of the individual module->Representing quantized visual self-attention model numberlThe outputs of the modules.
CN202410142459.9A 2024-02-01 2024-02-01 Quantification method suitable for vision self-attention model Pending CN117689044A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410142459.9A CN117689044A (en) 2024-02-01 2024-02-01 Quantification method suitable for vision self-attention model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410142459.9A CN117689044A (en) 2024-02-01 2024-02-01 Quantification method suitable for vision self-attention model

Publications (1)

Publication Number Publication Date
CN117689044A true CN117689044A (en) 2024-03-12

Family

ID=90139372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410142459.9A Pending CN117689044A (en) 2024-02-01 2024-02-01 Quantification method suitable for vision self-attention model

Country Status (1)

Country Link
CN (1) CN117689044A (en)

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591633A (en) * 2021-07-18 2021-11-02 武汉理工大学 Object-oriented land utilization information interpretation method based on dynamic self-attention Transformer
CN113610232A (en) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium
CN114429209A (en) * 2022-01-27 2022-05-03 厦门大学 Neural network post-training quantification method based on fine-grained data distribution alignment
CN114511069A (en) * 2022-04-18 2022-05-17 南京大学 Method and system for improving performance of low bit quantization model
CN114756517A (en) * 2022-03-24 2022-07-15 中科南京人工智能创新研究院 Visual Transformer compression method and system based on micro-quantization training
CN114937186A (en) * 2022-06-14 2022-08-23 厦门大学 Neural network data-free quantification method based on heterogeneous generated data
CN115049055A (en) * 2022-06-29 2022-09-13 厦门大学 Dynamic dual trainable boundary-based hyper-resolution neural network quantification method
CN115310607A (en) * 2022-10-11 2022-11-08 南京理工大学 Vision transform model pruning method based on attention diagram
CN115393633A (en) * 2022-08-05 2022-11-25 北京迈格威科技有限公司 Data processing method, electronic device, storage medium, and program product
CN115983322A (en) * 2023-01-12 2023-04-18 厦门大学 Compression method of visual self-attention model based on multi-granularity reasoning
CN116167418A (en) * 2022-11-14 2023-05-26 北京航空航天大学 Quantification method and device for visual neural network model
CN116385928A (en) * 2023-03-27 2023-07-04 南京大学 Space-time action detection method, equipment and medium based on self-adaptive decoder
WO2023126914A2 (en) * 2021-12-27 2023-07-06 Yeda Research And Development Co. Ltd. METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES
CN116486243A (en) * 2023-01-30 2023-07-25 哈尔滨工程大学 DP-ViT-based sonar image target detection method
CN116580440A (en) * 2023-05-24 2023-08-11 北华航天工业学院 Lightweight lip language identification method based on visual transducer
CN116664952A (en) * 2023-06-26 2023-08-29 山西大学 Image direction identification method integrating convolution and ViT
CN117036901A (en) * 2023-07-16 2023-11-10 西北工业大学 Small sample fine adjustment method based on visual self-attention model
CN117152554A (en) * 2023-08-02 2023-12-01 吾征智能技术(北京)有限公司 ViT model-based pathological section data identification method and system
CN117172301A (en) * 2023-09-04 2023-12-05 厦门大学 Distribution flexible subset quantization method suitable for super-division network
CN117369008A (en) * 2023-09-19 2024-01-09 湖北经济学院 TEM inversion method based on nearest unit multi-head attention and multi-scale coding mechanism
CN117370798A (en) * 2023-09-22 2024-01-09 北京百度网讯科技有限公司 Model compression method, training method, multimedia data processing method and device

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591633A (en) * 2021-07-18 2021-11-02 武汉理工大学 Object-oriented land utilization information interpretation method based on dynamic self-attention Transformer
CN113610232A (en) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium
WO2023126914A2 (en) * 2021-12-27 2023-07-06 Yeda Research And Development Co. Ltd. METHOD AND SYSTEM FOR SEMANTIC APPEARANCE TRANSFER USING SPLICING ViT FEATURES
CN114429209A (en) * 2022-01-27 2022-05-03 厦门大学 Neural network post-training quantification method based on fine-grained data distribution alignment
CN114756517A (en) * 2022-03-24 2022-07-15 中科南京人工智能创新研究院 Visual Transformer compression method and system based on micro-quantization training
CN114511069A (en) * 2022-04-18 2022-05-17 南京大学 Method and system for improving performance of low bit quantization model
CN114937186A (en) * 2022-06-14 2022-08-23 厦门大学 Neural network data-free quantification method based on heterogeneous generated data
CN115049055A (en) * 2022-06-29 2022-09-13 厦门大学 Dynamic dual trainable boundary-based hyper-resolution neural network quantification method
CN115393633A (en) * 2022-08-05 2022-11-25 北京迈格威科技有限公司 Data processing method, electronic device, storage medium, and program product
CN115310607A (en) * 2022-10-11 2022-11-08 南京理工大学 Vision transform model pruning method based on attention diagram
CN116167418A (en) * 2022-11-14 2023-05-26 北京航空航天大学 Quantification method and device for visual neural network model
CN115983322A (en) * 2023-01-12 2023-04-18 厦门大学 Compression method of visual self-attention model based on multi-granularity reasoning
CN116486243A (en) * 2023-01-30 2023-07-25 哈尔滨工程大学 DP-ViT-based sonar image target detection method
CN116385928A (en) * 2023-03-27 2023-07-04 南京大学 Space-time action detection method, equipment and medium based on self-adaptive decoder
CN116580440A (en) * 2023-05-24 2023-08-11 北华航天工业学院 Lightweight lip language identification method based on visual transducer
CN116664952A (en) * 2023-06-26 2023-08-29 山西大学 Image direction identification method integrating convolution and ViT
CN117036901A (en) * 2023-07-16 2023-11-10 西北工业大学 Small sample fine adjustment method based on visual self-attention model
CN117152554A (en) * 2023-08-02 2023-12-01 吾征智能技术(北京)有限公司 ViT model-based pathological section data identification method and system
CN117172301A (en) * 2023-09-04 2023-12-05 厦门大学 Distribution flexible subset quantization method suitable for super-division network
CN117369008A (en) * 2023-09-19 2024-01-09 湖北经济学院 TEM inversion method based on nearest unit multi-head attention and multi-scale coding mechanism
CN117370798A (en) * 2023-09-22 2024-01-09 北京百度网讯科技有限公司 Model compression method, training method, multimedia data processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RONGRONG JI 等: "DiffRate: Differentiable Compression Rate for Efficient Vision Transformers", IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 15 January 2024 (2024-01-15) *
RONGRONG JI 等: "SMMix: Self-Motivated Image Mixing for Vision Transformers", IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 15 January 2024 (2024-01-15) *

Similar Documents

Publication Publication Date Title
JP4004653B2 (en) Motion vector detection method and apparatus, and recording medium
US7643690B2 (en) Image decoding and encoding apparatus, method and computer readable storage medium
CN1230003C (en) Distortion quantizer model for video encoding
JP3419497B2 (en) Video compression method and decompression method and apparatus for fixing bits
CN107360426B (en) Video sequence reconstruction method based on compressed sensing
EP0759678B1 (en) Method and device for encoding picture signal
CN110753181A (en) Video image stabilization method based on feature tracking and grid path motion
JP2003259372A (en) Method and apparatus to encode moving image with fixed computation complexity
JP5313326B2 (en) Image decoding apparatus, method and program, and image encoding apparatus, method and program
WO1998036553A2 (en) Method and apparatus for recovering quantized coefficients
US20090096878A1 (en) Digital image stabilization method
TW200534714A (en) Rate controlling method and apparatus for use in a transcoder
US5881183A (en) Method and device for encoding object contour by using centroid
CN111327786A (en) Robust steganography method based on social network platform
CN117689044A (en) Quantification method suitable for vision self-attention model
CN111968032A (en) Self-adaptive sampling single-pixel imaging method
US7333543B2 (en) Motion vector estimation method and apparatus thereof
GB2320989A (en) Padding image blocks using substitute pixels for shape adaptive discrete cosine transform
CN112715029A (en) AI encoding apparatus and operating method thereof, and AI decoding apparatus and operating method thereof
CN1084471C (en) Local relaxation method for estimating light stream flux
JPH10124686A (en) Method and device for encoding contour
CN113810692B (en) Method for framing changes and movements, image processing device and program product
JP2006262310A (en) Decoder, inverse quantization method, and program thereof
Tian et al. Towards real-time neural video codec for cross-platform application using calibration information
TW484327B (en) Control method for video motion estimation with reduced computation and the device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination