CN115049055A

CN115049055A - Dynamic dual trainable boundary-based hyper-resolution neural network quantification method

Info

Publication number: CN115049055A
Application number: CN202210761410.2A
Authority: CN
Inventors: 纪荣嵘; 钟云山; 林明宝
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-09-13
Anticipated expiration: 2042-06-29
Also published as: CN115049055B

Abstract

A dynamic dual trainable boundary-based hyper-resolution neural network quantification method relates to the compression and acceleration of an artificial neural network. 1) Counting the maximum value distribution and the minimum value distribution of the activation value of each layer of the hyper-resolution neural network; 2) selecting a P% layer with the largest sum of the maximum value distribution variance and the minimum value distribution variance, applying a quantizer with a dynamic gate controller and trainable upper and lower bounds to the activation value of the P% layer, and applying quantizers with trainable upper and lower bounds to the activation values of other layers; 3) a quantizer applying asymmetric upper and lower bounds to the network weights; 4) quantizing the neural network by using a quantizer, initializing the weight of the dynamic gate controller, and training the quantized network by using L1 loss and structure transfer loss until a preset training round number is reached; 5) and (5) after the training is finished, keeping the weight of the quantization network, and obtaining the quantized quantization network.

Description

Dynamic dual trainable boundary-based hyper-resolution neural network quantification method

Technical Field

The invention relates to compression and acceleration of an artificial neural network, in particular to a dynamic dual trainable boundary-based hyper-resolution neural network quantification method.

Background

Single Image Super Resolution (SISR) is a classic and challenging research topic in the field of low-level computer vision. Its goal is to construct a High Resolution (HR) image from a given Low Resolution (LR) image. Recent years have been accompanied by a revolution in Deep Convolutional Neural Networks (DCNN), which has led to many recent advances in SISR tasks. Reviewing the development of DCNN in SISR, the performance of record breaking is often accompanied by a dramatic increase in model complexity. SRCNN (Dong C, Loy CC, He K, et al. learning a deep connected network for image super-resolution [ C ]// European conference on computer vision. Springer, Cham 2014: 184-. Then, EDSR (Lim B, Son S, Kim H, et al. enhanced depth residual network for single image super-resolution [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition processes.2017: 136. 144.) constructs a 64-layer CNN with a parameter of 1.5M. RDN (Zhang Y, Tian Y, Kong Y, et al. reactive dense network for image super-resolution [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2018:2472 and 2481.) was equipped with residual dense blocks, introducing 151 convolutional layers with a parameter of 22M. In addition, it requires approximately 5896G floating point operations (FLOPs) to generate a 1920x1080 image (magnification factor x 4). On the one hand, the high memory footprint and computational cost of the DCNN-based SR model discourages their deployment on many resource-scarce platforms, such as smartphones, wearable devices, embedded devices, and the like. SR, on the other hand, is particularly popular on these devices, and the user must increase the picture resolution after taking it. Therefore, compression of the DCNN-based SR model has received a great deal of attention from both academic and industrial sectors.

Through full-precision weighting and activation within discrete DCNN, network quantization has become one of the most promising techniques. It not only reduces the memory storage of low precision representation, but also reduces the computational cost of more efficient integer arithmetic. Therefore, mining specialized quantization methods for DCNN-based SR models has attracted increasing attention in recent years in the research community. For example, PAMS (Li H, Yan C, Lin S, et al., Pams: Quantized super-resolution video parameter [ C ]// European Conference on Computer Vision. Springer, Cham,2020:564-580.) designs a hierarchical quantizer with learnable clipping to handle a wide range of activations, but suffers from severe performance degradation in ultra-low precision settings (e.g., 2-bit and 3-bit). A recent study, DAQ (Hong C, Kim H, Baik S, et al. DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks [ C ]// Proceedings of the IEEE/CVF Window Conference on Applications of Computer Vision.2022: 2675-. Despite these advances, performance improvement comes at the cost of the huge overhead of normalized and non-normalized feature maps, as well as the expensive single-channel quantizer.

Therefore, existing research is involved in either significant additional cost or significant performance degradation when performing ultra-low precision quantization.

Disclosure of Invention

The invention aims to provide a quantization method of a hyper-neural network based on dynamic double trainable boundaries when the current hyper-neural network carries out low bit quantization, only a hierarchical quantizer designed by the invention needs to be applied, and the quantization network can be obtained by training from the beginning directly, and meanwhile, the performance is higher.

The invention comprises the following steps:

1) counting the maximum value distribution and the minimum value distribution of the activation value of each layer of the hyper-resolution neural network;

2) selecting a P% layer with the largest sum of the maximum value distribution variance and the minimum value distribution variance, applying a quantizer with a trainable upper bound and a trainable lower bound and a dynamic gate controller to the activation value of the P% layer, and applying quantizers with the trainable upper bound and the trainable lower bound to the activation value of other layers;

3) a quantizer applying asymmetric upper and lower bounds to the network weights;

4) quantizing the neural network by using a quantizer, initializing the weight of the dynamic gate controller, and training the quantized network by using L1 loss and structure transfer loss until a preset training round number is reached;

5) and (5) after the training is finished, keeping the weight of the quantization network, and obtaining the quantized quantization network.

In the step 1), counting the maximum value distribution and the minimum value distribution of the activation values of each layer of the hyper-division neural network, and recording the maximum value and the minimum value of the activation values of each layer by using a picture of a pre-trained hyper-division neural network forward training set; a pre-trained hyper-neural network refers to a network model that has been trained on a target data set.

In step 2), the specific method for selecting the P% layer with the largest sum of the maximum distribution variance and the minimum distribution variance, applying a quantizer with a trainable upper bound and a trainable lower bound and a dynamic gate controller to the activation value of the P% layer, and applying quantizers with the trainable upper bound and the trainable lower bound to the activation value of the other layer is as follows: using the maximum value distribution of each layer obtained in the step 1), counting the variance of the distribution, and recording the variance as

Using the minimum value distribution of each layer obtained in step 1) as the variance of the maximum value distribution of the l-th layer, and counting the variance of the distribution, and recording the variance as

Variance of minimum value distribution of the l-th layer; add the i-th layer maximum and minimum distribution variances, noted

Selecting DI ^l The layer with the largest value of P% applies a quantizer with a dynamic gate controller having trainable upper and lower bounds, and the other layers apply quantizers with trainable upper and lower bounds;

wherein a quantizer with trainable upper and lower bounds is designed as follows:

wherein alpha is _l ，α _u Respectively representing a trainable upper bound and a trainable lower bound; f represents an activation value of the network; round denotes rounding its input to the nearest integer；

Is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width;

the dynamic door controller is designed as follows:

β _l ，β _u ＝2*Sigmoid(Conv2(BN(Conv1(AvgPooling(F)))))

wherein avgpo means average pooling, pooling the input profile F from C H W to C1W 1, followed by Conv1 being a 1 x1 convolution, the number of output channels being 32, BN means batchnormal layer, Conv2 being a 1 x1 convolution, the number of output channels being 2, and finally passing through a Sigmoid function and multiplying by 2; to obtain beta _l ，β _u Two adjustment coefficients with a value range of [0, 2 ]]Respectively converting beta to _l ，β _u Multiplying by l, u to dynamically adjust its trainable upper and lower bounds based on the input feature graph F; at this point, the quantizer with a dynamic gate controller with trainable upper and lower bounds is:

wherein, alpha' _l ＝α _l *β _l ，α′ _u ＝α _u *β _u (ii) a Note that α _u ，α _l Training; for the activation values, a layer-by-layer quantization approach is used.

In step 3), the quantizer of the asymmetric upper and lower bounds is defined as follows:

wherein, w _l ，w _u Respectively taking a (100-M) quantile and an M quantile of the weight; f represents an activation value of the network; round denotes rounding its input to the nearest integer;

is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width; for the weights, a layer-by-layer quantization approach is used.

In step 4), quantizing the neural network by using the quantizer, initializing the weight of the dynamic gate controller, and training the quantized network by using the L1 loss and the structure transfer loss until a predetermined number of training rounds is reached, including:

the L1 loss:

the structure transfer loss:

wherein, F' _s ，F′ _T The full-precision network and the quantized network activation value are respectively characterized by the following calculation:

wherein F ∈ R ^C*H*w Is the output of the advanced feature module;

overall loss function:

L＝L ₁ +1000L _SKT

the invention can be applied to the convolution neural network in the field of image super-resolution, and compared with the prior art, the invention has the following outstanding advantages:

through a large number of experimental verifications, the dynamic dual trainable boundary-based quantization method for the hyper-neural network not only is simple to implement, has small calculation amount and parameter overhead, and greatly improves the performance, but also exceeds various mainstream quantization methods in performance, particularly when all layers are quantized to very low bits.

Drawings

FIG. 1 is a dynamic door controller of the present invention.

FIG. 2 is a block diagram of the algorithm of the present invention.

Detailed Description

The invention aims to provide a quantization method of a hyper-neural network based on dynamic double trainable boundaries aiming at the problem that the performance of the current hyper-neural network is reduced when low-bit quantization is performed. To the compression and acceleration of artificial neural networks.

An algorithm framework diagram of an embodiment of the invention is shown in fig. 2.

1. Description of the symbols

F(W ¹ ，W ² ，...，W ^L ) A full-precision Convolutional Neural Network (CNN) representing one L layer, where W ⁱ Represents the ith convolution layer, the number of convolution kernels of the ith convolution layer is out ⁱ The convolution kernel weight for this layer can be expressed as:

wherein, W _j ⁱ A jth convolution kernel representing an ith convolution layer, each convolution kernel W _j ⁱ In order to realize the purpose of the method,

therein, in ⁱ ，width ⁱ ，height ⁱ Number of input channels, convolution, respectively, of i-th layerThe width and height of the core. Given the input a of the ith convolutional layer ^i-1 (i.e., the output of the previous layer), the convolution result of the ith convolutional layer can be expressed as:

wherein,

is the jth channel of the j convolution result, and all channels are collected to obtain O ⁱ ，

Representing a convolution operation. Then, the convolution result is passed through an activation function to obtain a final output activation value of the layer:

A ⁱ ＝σ(O ⁱ )

σ denotes the activation function.

The goal of the quantization algorithm is to obtain a neural network that can operate with low bits, when the convolution operation is expressed as:

wherein,

a jth channel representing the quantized jth convolution kernel of the ith layer and the input of the ith layer. At the moment, the quantization algorithm can obtain an L-layer low-precision convolutional neural network

Wherein,

representing the ith convolutional layer that has been quantized.

To obtain a quantized network, the pre-trained full precision network is quantized. The quantization is as follows:

wherein l and u represent the upper and lower clipping boundaries. m represents a full precision input, which may be a network weight W or an activation value a. round means rounding its input to the nearest integer.

Is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width. For the weights, a channel-by-channel quantization approach is used, i.e., each output channel has a separate clipping upper and lower bounds and scaling factor. For the activation values, a layer-by-layer quantization approach is used, i.e., each layer shares the same clipping upper and lower bounds and scaling factor. After the quantized value q is obtained, it can be dequantized back with a scaling factor

And then the operation is performed. For the convolution operation of the two quantized values, one can use:

wherein s is ₁ ，s ₂ Can be stored by pre-calculation, and q ₁ ，q ₂ All are low precision values, so that the original full-precision operation can be replaced by only low-precision convolution operation.

2. Dynamic dual trainable bounded hyper-neural network analysis

The existing quantization methods of the hyper-derivative neural network all use symmetric quantizers for the activation values, and the performance is significantly degraded when low bit quantization is performed. To improve the performance of a quantized networkThe invention discloses a dynamic double trainable boundary-based hyper-resolution neural network quantification method. The dual trainable boundaries may simultaneously adapt to symmetry (setting α) _l ＝-α _u ) And asymmetrical activation values, and in order to further accommodate dynamically changing activation values, the present invention also provides a dynamic gate controller to dynamically adjust the clipping threshold based on the input. When low-bit quantization is performed, only the hierarchical quantizer designed by the invention needs to be applied, so that a quantization network can be obtained by training from the beginning directly, and the performance is higher.

3. Description of the training

The invention comprises the following steps:

2) selecting a P% layer with the largest sum of the maximum value distribution variance and the minimum value distribution variance, applying a quantizer with a dynamic gate controller and trainable upper and lower bounds to the activation value of the P% layer, and applying quantizers with trainable upper and lower bounds to the activation values of other layers;

In the step 1), the maximum value distribution and the minimum value distribution of the activation values of each layer of the hyper-division neural network are counted, and the maximum value and the minimum value of the activation values of each layer are recorded by using the picture of a pre-trained hyper-division neural network forward training set. A pre-trained hyper-neural network refers to a network model that has been trained on a target data set.

In step 2), the layer with the largest sum of the maximum distribution variance and the minimum distribution variance is selected, the quantizer with the upper and lower trainable bounds and the dynamic gate controller is applied, and the quantizer with the upper and lower bounds are applied to the other layers. Using the product obtained in step 1)For each layer of maximum distribution, the variance of the distribution is counted and recorded as

Calculating the variance of the distribution by using the minimum value distribution of each layer obtained in the step 1) as the variance of the maximum value distribution of the layer 1, and recording the variance as

Is the variance of the minimum distribution of layer 1. Add the i-th layer maximum and minimum distribution variances, noted

Selecting DI ^l The layer with the largest value of P% applies a quantizer with a dynamic gate controller with trainable upper and lower bounds, and the other layers apply quantizers with trainable upper and lower bounds.

Wherein, the quantizer with the trainable upper and lower bounds is designed as follows:

wherein alpha is _l ，α _u Trainable upper and lower bounds are represented separately. F denotes the activation value of the network. round means rounding its input to the nearest integer.

Is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width.

The dynamic door controller is designed as follows:

β _l ，β _u ＝2*Sigmoid(Conv2(BN(Conv1(AvgPooling(F)))))

where avgpo represents average pooling, pooling the input profile F from C H W to C1, followed by Conv1 being a 1 x1 convolution, the number of output channels being 32, BN representing batchnormal layers, Conv2 being a 1 x1 convolution, the number of output channels being 2, and finally passing through a Sigmoid function and multiplying by 2. The flow chart refers to fig. 1.

Finally obtain beta _l ，β _u Two adjustment coefficients with a value range of [0, 2 ]]Respectively converting beta to _l ，β _u Multiplying by l, u to dynamically adjust its trainable upper and lower bounds based on the input feature map F. At this point, the quantizer with a dynamic gate controller with trainable upper and lower bounds is:

wherein, alpha' _l ＝α _l *β _l ，α′ _u ＝α _u *β _u . Note that α _u ，α _l Can be trained. For the activation values, a layer-by-layer quantization approach is used.

In step 3), applying asymmetric upper and lower bounds quantizers to the network weights, the asymmetric upper and lower bounds quantizers being defined as follows:

wherein, w _l ，w _u The (100-M) quantile and the M quantile of the weight are taken, respectively. F denotes the activation value of the network. round indicates to input it into the houseThe nearest integer is entered.

Is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width. For the weights, a layer-by-layer quantization approach is used.

4) The neural network is quantized using a quantizer, the weights of the dynamic gate controller are initialized, and the quantized network is trained using L1 loss, structure transfer loss until a predetermined number of training rounds is reached.

After quantifying both the weights and the activation values, the overall algorithm flow is shown in fig. 2.

Loss of L1:

structure transfer loss:

wherein, F' _s ,F′ _T The full-precision network and the quantized network activation value structure characteristics can be calculated as follows:

wherein F ∈ R ^C*H*w Is the output of the advanced feature module.

Overall loss function:

L＝L ₁ +1000L _SKT

4. Implementation details

All models were trained on a training Set of DIV2K, including 800 images (Timoft R, agriculture E, Van Gool L, et al. N. tie 2017 passage on single image super-resolution: Methods and results [ C ]// Proceedings of the IEEE con on vision and paper recognition.2017: 114. sub.125.), and tested on four standard bases, including (Bei lacem, Roumy A, Guillot C, et al. Low-computer single-image-basis on non-woven connected image sub-resolution [ J. 2012 ], (light C, theory L. tie 20152. sub.52. sub.D.), tal D, et al.A database of human segmented natural images and its application to evaluation segmentation and measurement technical standards [ C ]// Proceedings Eighth IEEE International Conference on Computer Vision.ICCV 2001.IEEE,2001,2: 416. 423.) and Urban100 (Huangg J B, Singh A, Ahuja N.Singaporea subset-resolution from diffusion candidates [ C ]// Proceedings of the IEEE Conference on Computer vision and mapping. 52097). Two magnification factors of x2 and x4 were evaluated.

Quantitative SR models include EDSR (Lim B, Son S, Kim H, et al. enhanced depth determination networks for single image super-resolution [ C ]/processing of the IEEE communication on computer vision and paper recognition processes.2017: 136. 144.), RDN (Zhang Y, Ting Y, et al. resolution networks for image super-resolution [ C ]// processing of the IEEE communication computer vision and paper recognition processes.2018: 2472. 2481.) and Reserve Net (legacy C, is L, sHuz. r F, photo logic sub-resolution [ C ]// processing of the IEEE communication computer vision and paper recognition processes.2018: 2481. C.). Reference C, I, S L, S H, I. enhanced depth determination networks [ C ]// processing of the IEEE communication on computer vision and paper recognition processes [ C ]/(S7: I ] and I. conversion processes of the IEEE communication networks [ C ]. 80 ]. Quantified as 4, 3 and 2 bits and compared with SOTA competitors DoReFa (Zhou S, Wu Y, Ni Z, et al. Dorefa-net: Training low bit width connected neutral network with low bit width gradients [ J ]. arXiv prediction arXiv:1606.06160,2016.), Tensorflow Lite (TF Lite) (Jacob B, Kligys S, Chen B, et al. quantification and injection of neutral network for impact-interaction-parameter-interaction [ C ]/progression of the IEEE contract vision and mapping of probability of interest, 2018: 2704. oil-interaction, Picture J. (Pacific. J.), cham,2020: 564) was compared.

PSNR and SSIM on the Y channel (Wang Z, Bovik A C, Sheikh H R, et al. image quality assessment: from R visibility to structural precision [ J ]]IEEE transactions on image processing,2004,13(4):600-612.) was used as the evaluation index. For the quantization model, the weights and activations of the high-level feature extraction module are quantized. The low level feature extraction and reconstruction module is set to full precision. The batch size was set to 16 and the optimizer was Adam (Kingma D P, Ba J. Adam: A method for stochastic optimization [ J ]]arXiv preprint arXiv:1412.6980,2014.), wherein β ₁ ＝0.9$，β ₂ 0.999$ and e 10 ^-8 . The initial learning rate was set to 10 ^-4 And halved every 10 rounds. For EDSR, the gate ratio P is set to $30$, and the initialization factor M is 99. For RDN, P and M are 50 and 95, respectively. For SRResNet, P is 10 and M is 99. The total number of training rounds is set to 60. The training images are preprocessed by subtracting the average RGB. During training, random horizontal flipping and vertical rotation are used to add data. All experiments were performed using a Pythroch (Paszke A, Gross S, Massa F, et al., Pythroch: An experimental style, high-performance deep learning library [ J ]]Advances in neural information processing systems,2019, 32).

5. Field of application

The invention can be applied to the hyper-resolution convolutional neural network to realize the compression and acceleration of the hyper-resolution convolutional neural network.

Table 1, table 2 and table 3 show the quantitative results for EDSR, RDN and SRResNet on different datasets, respectively. It can be seen that the present invention (DDTB) is always superior to all comparison methods on these quantized SR models with different bit widths.

TABLE 1

TABLE 2

TABLE 3

For EDSR, in the case of 4 bits, the present invention is much better than PAMS. For example, for 4-bit EDSRx4, the present invention achieves a PSNR gain of 0.37dB over Urban 100. A more significant improvement can be observed when performing ultra low-bit quantization. For example, when EDSRx4 is normalized to 2 bits, the present invention achieves performance gains of 0.94dB, 0.66dB, 0.36dB, and 0.70dB on Set5, Set14, BSD100, and Urban 100.

For RDN, when the model is quantized to 4 bits, the method is slightly superior to the conventional PACT. The superior performance is particularly evident in terms of ultra-low bit rates. Specifically, for 2-bit RDNx4, the performance gains of the present invention on Set5, Set14, BSD100, and Urban100 are 0.64dB, 0.45dB, 0.26dB, and 0.51dB, respectively.

The results of SRResNet also show that the performance of the invention is improved more obviously under ultra-low precision. For 2-bit SRResNetx4, the performance of the invention on Set5, Set14, BSD100 and Urban100 is improved by 0.65dB, 0.47dB, 0.30dB and 0.69dB, respectively, while for 2-bit SRResNetx2, the performance gains are 1.15dB, 0.79dB, 0.67dB and 1.80dB, respectively.

The above-described embodiments are merely preferred embodiments of the present invention, and should not be construed as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims

1. The method for quantizing the hyper-neural network based on the dynamic dual trainable boundary is characterized by comprising the following steps of:

2. The method for quantizing a dynamic dual trainable boundary-based hyper-resolution neural network according to claim 1, wherein in step 1), the maximum value distribution and the minimum value distribution of the activation values of each layer of the hyper-resolution neural network are counted, and the maximum value and the minimum value of the activation values of each layer are recorded by using a picture of a pre-trained hyper-resolution neural network forward training set; a pre-trained hyper-neural network refers to a network model that has been trained on a target data set.

3. The method for quantizing a dynamic dual trainable boundary-based hyper-branched neural network according to claim 1, wherein in step 2), the specific method for selecting the P% layer with the largest sum of the maximum distribution variance and the minimum distribution variance, applying a quantizer with a dynamic gate controller and with trainable upper and lower bounds to the activation value of the P% layer, and applying quantizers with trainable upper and lower bounds to the activation value of the other layer is as follows: using the maximum value distribution of each layer obtained in the step 1), counting the variance of the distribution, and recording the variance as

Selecting DI ^l The layer with the largest value of P% applies a quantizer with trainable upper and lower bounds and a quantizer with a dynamic gate controller, and the other layers apply quantizers with trainable upper and lower bounds.

4. A method of quantizing a dynamic dual trainable boundary based hyper-branched neural network according to claim 3, wherein the quantizer having a trainable upper and lower bound is designed as follows:

wherein alpha is _l ,α _u Respectively representing a trainable upper bound and a trainable lower bound; f represents an activation value of the network; round denotes rounding its input to the nearest integer;

5. The method for quantizing a dynamic dual trainable boundary based hyper-neural network according to claim 3, wherein the dynamic gate controller is designed as follows:

β _l ，β _u ＝2*Sigmoid(Conv2(BW(Conv1(AvgPooling(F)))))

wherein avgpo means average pooling, pooling the input profile F from C H W to C1W 1, followed by Conv1 being a 1 x1 convolution, the number of output channels being 32, BN means batchnormal layer, Conv2 being a 1 x1 convolution, the number of output channels being 2, and finally passing through a Sigmoid function and multiplying by 2; to obtain beta _l ,β _u Two adjustment coefficients with a value range of [0, 2 ]]Respectively converting beta to _l ,β _u Multiplying by l, u to dynamically adjust its trainable upper and lower bounds based on the input feature map F.

6. The method for quantizing a dynamic dual trainable boundary-based hyper-branched neural network according to claim 1, wherein in step 2), the quantizer having a trainable upper boundary and a trainable lower boundary and having a dynamic gate controller is:

wherein, alpha' _l ＝α _l *β _l ,α′ _u ＝α _u *β _u (ii) a Note that α _u ,α _l Training; for the activation values, a layer-by-layer quantization approach is used.

7. The method for quantizing a dynamic dual trainable boundary-based hyper-branched neural network according to claim 1, wherein in step 3), the asymmetric upper and lower bounds quantizers are defined as follows:

wherein, w _l ,w _u Respectively taking a (100-M) quantile and an M quantile of the weight; f represents an activation value of the network; round denotes rounding its input to the nearest integer;

8. The method for quantizing a dynamic dual trainable boundary-based hyper-branched neural network according to claim 1, wherein in step 4), the quantizing the neural network using a quantizer, initializing weights of a dynamic gate controller, and training the quantizing network using L1 loss and structure transfer loss until a predetermined number of training rounds is reached, wherein:

the L1 loss:

the structure transfer loss:

wherein F ∈ R ^C*H*w Is the output of the advanced feature module;

overall loss function:

L＝L ₁ +1000L _SKT 。