Disclosure of Invention
The invention provides a quantitative pruning technology of a defogging model, aiming at the defect that hardware resources are consumed more when hardware deployment is carried out on the defogging model based on a convolutional neural network in the prior art.
In order to solve the technical problem, the invention is solved by the following technical scheme:
a quantitative pruning method of a defogging model comprises the following steps:
acquiring original parameter data of a defogging model, and determining boundary data based on the original parameter data;
acquiring a fog-containing image;
acquiring a plurality of quantization bit numbers which are configured in advance;
based on the boundary data and the original parameter data, determining an optimal quantization bit number, and based on the optimal quantization bit number, performing quantization pruning on the defogging model, wherein the method specifically comprises the following steps:
based on the boundary data and the original parameter data, calculating quantization parameter data under the corresponding quantization bit number, and performing inverse quantization calculation on the quantization parameter data to obtain corresponding inverse quantization parameter data; the quantization parameter data is used for performing quantization pruning, and the inverse quantization parameter data is used for simulating the defogging effect of a quantization model after quantization pruning is performed on the basis of the quantization parameter data;
generating a corresponding defogging test model based on the inverse quantization parameter data, and performing a defogging effect test on the defogging test model by using the defogging image to obtain a corresponding test result, namely, checking whether the defogging treatment on the defogging image by the defogging test model can achieve a preset effect;
and determining the optimal quantization bit number based on the obtained test result, extracting quantization parameter data corresponding to the optimal quantization bit number, and performing quantization pruning on the defogging model based on the quantization parameter data.
Aiming at the scheme of a fixed bit number in the prior quantization technology, the invention screens out the lowest quantization bit number meeting the defogging effect by setting a plurality of quantization bit numbers, and can fully save hardware resources during model deployment by utilizing the characteristic that the FPGA can calculate based on any bit number.
Aiming at the characteristics that the precision is lost in the quantization process and the precision is not influenced in the inverse quantization process, the original parameter data is quantized and inversely quantized, and then the obtained inverse quantization parameter data is utilized to simulate the defogging effect, so that the optimal quantization bit number can be quickly and accurately determined before the defogging model is quantized, pruned and deployed by hardware.
As an implementable manner, the defogging effect test is performed on each defogging test model by using the fogging image, and the specific steps for obtaining the corresponding test result are as follows:
inputting the fog-containing image into the defogging model in advance to obtain a corresponding defogging reference image;
inputting the fog-containing image into a corresponding defogging test model, and outputting a corresponding defogging test image by the defogging test model;
and evaluating the image quality of the defogging test chart and/or evaluating the image quality of the defogging test chart based on the defogging reference chart to obtain a corresponding test result.
As an implementation manner, the specific steps of performing quantitative pruning on the defogging model based on the quantitative parameter data are as follows:
counting zero-value parameters based on the quantization parameter data to obtain zero-value distribution data and a zero rate, wherein the zero rate is used for indicating the proportion of the zero-value parameters, and the zero rate = the number of the zero-value parameters/the total quantity of the parameters;
when the zero rate is higher than a preset zero rate threshold value, quantizing the defogging model based on the quantized parameter data and the zero value distribution data, performing core-level and channel-level structured pruning, and outputting the obtained quantized model;
if the core-level or channel-level structural pruning cannot be carried out, quantizing the defogging model based on the quantization parameter data, and outputting the obtained quantization model;
in the actual pruning process, whether structured pruning can be carried out or not can be judged based on the distribution data;
otherwise, quantizing the defogging model based on the quantization parameter data, and outputting the obtained quantization model.
The current commonly used pruning mode is an iterative pruning mode, which can only implement pruning at a single granularity, i.e. the pruning mode cannot implement simultaneous nuclear-level and channel-level structurization, and the mode is time-consuming.
In addition, after pruning (such as kernel level or channel level) under different granularities, hardware implementation of a model after pruning is also a difficult point, so that channel level structured pruning is common at present;
the invention aims at carrying out quantitative pruning on a defogging model based on a convolutional neural network, judges the redundancy degree of parameters according to zero rate through the design of zero rate and zero value distribution data, judges which data are distributed in a kernel level and a channel level based on the zero value distribution data, and reconstructs an equivalent model after pruning based on the existing model reconstruction technology to realize the structuralization and simultaneous pruning of the kernel level and the channel level, thereby further saving hardware resources and improving the deployment efficiency of a hardware terminal.
As an implementable manner, the specific steps of calculating quantization parameter data under a corresponding quantization bit number and performing inverse quantization calculation on the quantization parameter data to obtain corresponding inverse quantization parameter data include:
determining a maximum quantization value and a minimum quantization value based on a target quantization bit number;
calculating based on the boundary data, the maximum quantization value and the minimum quantization value to obtain corresponding scale factor data;
calculating based on the boundary data, the maximum quantization value and the scale factor data to obtain corresponding zero point data;
carrying out quantization translation on the original parameter data based on the scale factor data and the zero point data to obtain corresponding quantization parameter data;
carrying out inverse quantization on the quantization parameter data to obtain corresponding inverse quantization parameter data;
the data types of the original parameter data, the boundary data and the inverse quantization parameter data are all single-precision floating point types, and the data types of the zero point data and the quantization parameter data are integers.
As an implementable embodiment:
the original parameter data comprises a plurality of unit parameter data;
the boundary data comprises a plurality of boundary pairs, the boundary pairs correspond to the unit parameter data one by one, and the boundary pairs comprise a first boundary and a second boundary;
the specific steps for acquiring the unit parameter data and the boundary pair are as follows:
dividing the defogging model by taking the convolutional layer as a dividing node to obtain a plurality of network units;
acquiring single-precision floating point parameters corresponding to each network unit to acquire corresponding unit parameter data;
based on each unit parameter data, a corresponding boundary pair is determined.
The structure for extracting features in the defogging model is a convolution layer, and other parts such as a pooling layer or an activation function are auxiliary calculation parts but not core parts, so that the invention mainly focuses on the convolution part during quantization, and the importance of each convolution layer is different, the data distribution is different, and the boundary pairs are different, so that the invention uses the convolution layer as a partition node to partition the defogging model.
Further, the specific step of determining the corresponding boundary pair based on the unit parameter data of the current network unit is as follows:
determining a first value interval and a second value interval based on the unit parameter data of the current network unit, wherein the first value interval is used for indicating the value range of a first boundary, and the second value interval is used for indicating the value range of a second boundary;
respectively taking the first value interval and the second value interval as target intervals;
dividing a plurality of distribution intervals based on the target interval, and performing iterative computation based on the distribution intervals and the unit parameter data to obtain quantization difference degrees corresponding to the unit parameter data in each distribution interval, wherein the distribution intervals all fall in the target interval, and the starting points of the distribution intervals are the same;
and determining a corresponding boundary value based on the distribution interval corresponding to the minimum quantization difference degree.
In the existing scheme, the maximum parameter and the minimum parameter of unit parameter data are used as a boundary pair, and the influence of outliers on a quantization result is not considered;
if the distribution interval based on the unit parameter data is iterated once to obtain the symmetrical boundary pair, although the influence of the outlier on the quantization result is considered, the symmetrical boundary does not conform to the actual asymmetric model parameter data.
Further, the specific steps of calculating the quantization difference degree in the current iteration process are as follows:
based on the distribution interval, extracting single-precision floating point parameter data adopted in the current iteration process from the unit parameter data to obtain calculation data;
normalizing the calculated data to obtain corresponding original distribution data;
mapping the calculated data to obtain corresponding quantized data;
carrying out normalization processing on the quantized data to obtain corresponding quantized distribution data;
and calculating the difference degree based on the original distribution data and the quantized distribution data to obtain the corresponding quantized difference degree.
Further, the specific step of determining the distribution interval adopted in the current iteration process is as follows:
dividing a target interval into a plurality of sub-intervals in advance;
determining the number of reference intervals based on the target quantization bit number;
determining the number of accumulated intervals based on the current iteration times and the number of reference intervals;
and extracting corresponding number of subintervals in sequence based on the accumulated interval number to obtain corresponding distribution intervals.
Further, the specific steps of determining the first value range and the second value range are as follows:
acquiring the minimum parameter in the unit parameter datar min And maximum parameterr max ;
Based on the minimum parameterr min And the maximum parameterr max Determination of the first locationT min And a second anchor pointT max Wherein, when|r min |≤|r max |When it is used, orderT min = r min ,T max = r max Otherwise, orderT min = r max ,T max = r min ;
When in use|r min |≤|r max |When the [2 ], [ will ]T min ,0]As the first value-taking interval, will [ -T min , T max ]As a second value interval;
when in use|r min |>|r max |When the [2 ], [ will ]T max ,-T min ]As a first value interval, the value of [0,T min ]as a second value interval.
The invention is achieved by subjecting a [ alpha ], [ alpha ] tor min , r max ]The number of the bits is divided into 0, T min ]and [ -T min , T max ]The two value intervals can reduce the parameters adopted by iterative computation, thereby shortening the computation time;
the invention also provides a quantitative pruning system of the defogging model, which comprises the following components:
the preprocessing module is used for acquiring original parameter data of a defogging model and determining boundary data based on the original parameter data;
the image acquisition module is used for acquiring a fog-containing image;
the configuration module is used for acquiring a plurality of quantization bit numbers which are configured in advance;
the quantization pruning module is used for determining the optimal quantization bit number based on the boundary data and the original parameter data and performing quantization pruning on the defogging model based on the optimal quantization bit number;
the quantification pruning module comprises:
the quantization inverse quantization unit is used for calculating quantization parameter data under the corresponding quantization bit number based on the boundary data and the original parameter data, and performing inverse quantization calculation on the quantization parameter data to obtain corresponding inverse quantization parameter data;
the effect detection unit is used for generating a corresponding defogging test model based on the inverse quantization parameter data, and performing a defogging effect test on the defogging test model by using the image containing the fog to obtain a corresponding test result;
and the quantization pruning unit is used for determining the optimal quantization bit number based on the obtained test result, extracting quantization parameter data corresponding to the optimal quantization bit number, and performing quantization pruning on the defogging model based on the quantization parameter data.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
according to the invention, after the original parameter data is subjected to quantization and inverse quantization operations, the obtained inverse quantization parameter data is utilized to simulate the defogging effect, so that the optimal quantization bit number can be rapidly and accurately determined, and model quantization pruning and hardware deployment are carried out by using the lowest bit number on the premise of meeting the defogging effect, thereby saving hardware resources.
According to the invention, through the design of zero-rate and zero-value distribution data, the parameter redundancy degree is judged according to the zero-rate, and core-level and channel-level structured pruning is carried out on the basis of the zero-value distribution data, so that hardware resources can be further saved and the deployment efficiency of a hardware terminal can be improved.
According to the invention, based on two value intervals, two times of iterative computation are carried out, and an asymmetric boundary pair which is more fit for actual model parameter distribution can be obtained, so that the quantization pruning effect corresponding to each quantization bit number is optimized, the lowest bit number meeting the defogging effect is further reduced, and the hardware resources are further saved.
Detailed Description
The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.
Example 1, a quantitative pruning method of a defogging model, as shown in fig. 1, includes the following steps:
s100, data preparation:
s110, acquiring original parameter data of a defogging model, and determining boundary data based on the original parameter data;
the defogging model is a convolutional neural network model of a single precision data type (float) obtained by pre-training.
The raw parameter data includes a number of unit parameter data.
The boundary data comprises a plurality of boundary pairs, the boundary pairs correspond to the unit parameter data one by one, and the boundary pairs comprise first boundaries and second boundaries, wherein the first boundaries are smaller than the second boundaries.
The method specifically comprises the following steps:
s111, dividing the defogging model by taking the convolutional layer as a dividing node to obtain a plurality of network units;
those skilled in the art can set the partitioning rule by themselves according to the actual situation, so that each network unit includes one convolutional layer, which is not limited in detail in this specification.
And S112, acquiring single-precision floating point parameters corresponding to each network unit, and acquiring corresponding unit parameter data.
And S113, determining corresponding boundary pairs based on the unit parameter data.
In this embodiment, the minimum parameter in the unit parameter data is usedr min As a first boundaryαMaximum parameterr max As a second boundaryβ。
S120, acquiring a fog-containing image;
s130, acquiring a plurality of quantization bit numbers which are configured in advance;
those skilled in the art can configure the number of quantization bits and the value of each quantization bit according to actual needs, and the value range of the quantization bits in this embodiment is [2,8 ].
S200, determining an optimal quantization bit number based on the boundary data and the original parameter data, and performing quantization pruning on the defogging model based on the optimal quantization bit number;
the optimal quantization bit number is the lowest quantization bit number which meets the defogging effect.
The method specifically comprises the following steps:
s210, based on the boundary data and the original parameter data, calculating quantization parameter data under a corresponding quantization bit number, and performing inverse quantization calculation on the quantization parameter data to obtain corresponding inverse quantization parameter data;
in the neural network training and reasoning stage, a float data type is generally used, and quantization refers to the inverse process of mapping float data in a neural network model obtained by training into corresponding integer data and carrying out inverse quantization.
The method specifically comprises the following steps:
s211, determining the maximum quantization value based on the target quantization bit numberq max And minimum quantization valueq min ;
By maximum quantisation valueq max And minimum quantization valueq min The value range of the quantization parameter data under the target quantization bit number can be determined, and the calculation formula is as follows:
q
max
=2
Bit-1
-1;
q
min
=-2
Bit-1
;
wherein the content of the first and second substances,Bitfor the target number of quantization bits, maximum quantization valueq max And minimum quantization valueq min The data type of (1) is an integer.
S212, calculating based on the boundary data, the maximum quantization value and the minimum quantization value to obtain corresponding scale factor data;
the scale factor data packetIncluding the scale factors corresponding to the network elements one-to-one, in this embodiment, the scale factor corresponding to the target network element is calculatedSThe formula of (1) is:
S=(β-α)/(q
max
- q
min
);
wherein the content of the first and second substances,αis the first boundary of the target network element,βis the second boundary, the first boundary, of the target network elementαAnd a second boundaryβIs of the single precision floating point type.
S213, calculating based on the boundary data, the maximum quantization value and the scale factor data to obtain corresponding zero data;
the zero point data includes zero points in one-to-one correspondence with the network elementsZIn this embodiment, the zero point corresponding to the target network unit is calculatedZThe formula of (1) is:
Z=round(q
max
-β/ S)
wherein the content of the first and second substances,round(*)the function of rounding is represented by a number of,βis the second boundary of the target network element,Sis a scale factor, zero, of the target network elementZIs of the single precision floating point type.
S214, carrying out quantization translation on the original parameter data based on the scale factor data and the zero point data to obtain corresponding quantization parameter data;
the original parameter data comprises single-precision floating point parameters corresponding to each network unit;
the quantization parameter data comprises integer parameters which are in one-to-one correspondence with the single-precision floating point parameters;
in this embodiment, based on the scale factor and the zero point of the network unit where the single-precision floating point parameter is located, the single-precision floating point parameter is quantized and translated to obtain a corresponding integer parameter, and the calculation formula is as follows:
q=round(r/ S+Z)-Z;
wherein the content of the first and second substances,rfor the target single-precision floating-point parameter,Sthe scaling factor corresponding to the network element where the target single-precision floating point parameter is located,Zthe zero point corresponding to the network unit where the target single-precision floating point parameter is located,qand the integer parameters are integer parameters corresponding to the target single-precision floating point parameters.
S215, carrying out inverse quantization on the quantization parameter data to obtain corresponding inverse quantization parameter data;
the inverse quantization parameter data comprises inverse quantization parameters which are in one-to-one correspondence with the integer parameters, and the data type of the inverse quantization parameters is a single-precision floating point type;
the formula for calculating the inverse quantization parameter is as follows:
r r = q×S;wherein the content of the first and second substances,qin order to target the parameters of the shaping,Sis the scale factor corresponding to the network element where the target integer parameter is located,r r is the inverse quantization parameter corresponding to the target integer parameter,×is a multiplication operation.
S220, generating a corresponding defogging test model based on the inverse quantization parameter data, and carrying out a defogging effect test on the defogging test model by using the image containing the fog to obtain a corresponding test result;
the method specifically comprises the following steps:
s221, generating a corresponding defogging test model based on the inverse quantization parameter data;
namely, the generated inverse quantization parameter data is used as a model parameter of the defogging model to obtain a corresponding defogging test model;
the defogging test model is used for simulating the defogging effect of the defogging model after quantization.
S222, inputting the fog-containing image into a corresponding defogging test model, and outputting a corresponding defogging test image by the defogging test model.
S223, carrying out image quality evaluation on the defogging test chart, and/or carrying out image quality evaluation on the defogging test chart based on a defogging reference chart to obtain a corresponding test result;
the defogging reference image is acquired in the following mode:
and inputting the fog-containing image into the defogging model, defogging the fog-containing image by the defogging model, and outputting a corresponding defogging reference map by the defogging model.
The person skilled in the art can perform quality evaluation on the obtained defogging test chart based on the defogging reference map by using one or more image quality evaluation methods disclosed in the prior art, and the description does not limit the quality in detail.
And S230, determining the optimal quantization bit number based on the obtained test result.
Those skilled in the art can set the determination method of the optimal quantization bit number according to the actual needs, for example:
sequentially testing the defogging effect of the defogging test model corresponding to each quantization bit number according to the sequence from low quantization bit number to high quantization bit number to obtain a corresponding test result until the defogging test model is judged to reach the preset defogging effect based on the test result, and taking the quantization bit number adopted in the test process as the optimal quantization bit number;
or respectively testing the defogging effect of the defogging test model corresponding to each quantization bit number to obtain corresponding test results, summarizing all the test results, and determining the optimal quantization bit number based on the change trend of the test results.
In the existing model quantization technology, the quantization bit number is often fixed, and because it is easier for a processor to process one byte (1 byte =8 bits), 8 bits are used as the quantization bit number in the field to perform quantization pruning on a convolutional neural network model;
however, for an FPGA capable of calculating with any bit (bit), if 8 bits are fixed for model quantization pruning, hardware resources required for model deployment cannot be sufficiently reduced;
for the quantization bit number, the smaller the quantization bit number is, the less hardware resources are needed for hardware deployment of the quantized convolutional neural network model, but when the quantization bit number is smaller, the range of values which can be represented is limited, so that the distribution of the obtained quantization parameter data and the original parameter data is greatly different, and the final inference result is influenced.
Aiming at the characteristics that the precision loss is caused by rounding errors in the quantization process, the defogging effect is influenced, and the precision loss does not exist in the inverse quantization process, the minimum quantization bit number meeting the defogging effect can be determined by calculating the quantization parameter data of the quantization bit number, carrying out inverse quantization on the quantization parameter data, carrying out the simulation test of the defogging effect based on the obtained inverse quantization parameter data, and quantizing, pruning and deploying the defogging model.
S240, as shown by the dotted line in fig. 1, extracting quantization parameter data corresponding to the optimal quantization bit number, and performing quantization pruning on the defogging model based on the quantization parameter data;
the method comprises the following specific steps:
s241, counting zero-value parameters based on the quantization parameter data to obtain zero-value distribution data and a zero rate, wherein the zero rate is used for indicating the occupation ratio of the zero-value parameters;
zero-rate = number of zero-valued parameters/total number of parameters;
the zero rate is used to indicate the degree of redundancy that exists for the quantized model;
the zero-value distribution data is used for showing the distribution condition of the zero-value parameters at a core level and a channel level so as to facilitate the structured pruning at the core level and the channel level.
S242, when the zero rate is higher than a preset zero rate threshold value, quantizing the defogging model based on the quantization parameter data and the zero value distribution data, performing core-level and channel-level structured pruning, and outputting an obtained quantization model;
after structured pruning, hardware can be saved and operation can be accelerated, while after unstructured pruning, model parameters can be thinned (or called matrix thinning), and the purposes of saving hardware resources and accelerating cannot be achieved;
in this embodiment, the distribution conditions of zero-value parameters in the core level and the channel level are judged based on zero-value distribution data, and the defogging model is quantized and subjected to core-level and channel-level structured pruning;
the convolution operation is essentially multiplication accumulation operation, and the pruning aims at removing operation with zero quantized model parameters so as to achieve the purpose of saving hardware resources, so that the core-level (kernel) and channel-level (channel) structured simultaneous pruning can be realized by modifying the number of output channels and eliminating the part with 0 multiplication accumulation sum based on zero-value distribution data, thereby further saving the hardware resources and improving the deployment efficiency of hardware terminals.
Note that the defogging model is quantized based only on the quantization parameter data even if the zero rate is higher than a preset zero rate threshold, such as zero-valued parameters not distributed in the kernel level and the channel level.
S243, when the zero rate is less than or equal to a preset zero rate threshold value, quantizing the defogging model based on the quantization parameter data, and outputting the obtained quantization model;
when the zero rate is less than or equal to the preset zero rate threshold, the redundancy degree is low, pruning processing is not needed, and therefore the defogging model is directly quantized based on the quantization parameter data, and a corresponding quantization model is generated and output for hardware deployment.
Example 2, the scheme of determining the boundary pairs in example 1 was changed, and the rest of the scheme was the same as example 1.
In this embodiment, the specific step of determining the corresponding boundary pair based on the parameter data of each unit is as follows:
and S10, determining a first value interval and a second value interval based on the cell parameter data of the current network cell.
The first value interval is used for indicating the value range of the first boundary, and the second value interval is used for indicating the value range of the second boundary.
The method comprises the following specific steps:
s11, obtaining the minimum parameter in the unit parameter datar min And maximum parameterr max ;
S12, based on the minimum parameterr min And the maximum parameterr max Determining a first value interval and a second value interval;
a person skilled in the art can set the determination mode of the first value interval and the second value interval according to actual needs, for example:
the first scheme is as follows: the reaction mixture of the [0 ] and the alpha-olefin, r min ]as a first value interval, the value of [0,r max ]as a second value interval;
scheme II:
based on the minimum parameterr min And the maximum parameterr max Determination of the first locationT min And a second anchor pointT max Wherein, when|r min |≤|r max |When it is used, orderT min = r min ,T max = r max Otherwise, orderT min = r max ,T max = r min ;
When in use|r min |≤|r max |When the [2 ], [ will ]T min ,0]As the first value-taking interval, will [ -T min , T max ]As a second value interval;
when in use|r min |>|r max |When it is usedT min , T max ]As a first value interval, the value of [0,T min ]as a second value interval.
In this embodiment, the first value interval and the second value interval are determined by using the second scheme, and in the actual calculation process, the first value interval and the second value interval can also be directly calculated based on [0,T min ]determining a boundary valuek 1 Based on [ -T min , T max ]Determining a boundary valuek 2 Let the first boundary valueα= min(k 1 ,k 2 ),β=max(k 1 ,k 2 )。
S20, taking the first value-taking interval and the second value-taking interval as target intervals, dividing a plurality of distribution intervals based on the target intervals, and performing iterative computation based on the distribution intervals and the unit parameter data to obtain quantization difference degrees corresponding to the unit parameter data in the distribution intervals;
the distribution intervals all fall in the target interval, and the starting points of the distribution intervals are the same.
The iterative calculation process of the first boundary and the second boundary is the same, and a person skilled in the art can define the calculation order of the first boundary and the second boundary by himself, and the description does not define the calculation order in detail.
The method comprises the following specific steps:
s21, equally dividing the target interval into a plurality of subintervals, and determining the number of reference intervals based on the target quantization bit number;
in this embodiment, the number of reference intervals is set to2 Bit-1 ,BitRepresenting a target quantization bit number;
the number of the subintervals obtained by the equant division can be set by a person skilled in the art according to actual needs, and the number of the subintervals is larger than 2 times of the number of the reference intervals, so that the greater the number of the subintervals, the higher the accuracy of the calculated boundary value is;
the number of sub-sections in this embodiment is set to 2048.
S22, determining the number of accumulated intervals based on the current iteration times and the number of reference intervals;
a person skilled in the art can set the accumulation rule of the interval value according to the actual situation, and the embodiment does not limit the accumulation rule in detail;
for example:
mode 1: taking the number of accumulation intervals adopted in the last iteration as an initial value, generating the number of accumulation intervals corresponding to the current iteration process based on the initial value and a preset accumulation step length, and taking the number of reference intervals as the initial value during the first iteration calculation;
mode 2: and generating an accumulation interval number based on the reference interval number, the current iteration number and a preset accumulation step length.
This embodiment adopts mode 1.
And S23, extracting corresponding number of subintervals in sequence based on the accumulated interval number to obtain corresponding distribution intervals.
That is, when the number of accumulation intervals isiBefore extractioniAnd obtaining distribution intervals.
S24, extracting unit parameter data adopted in the current iteration process based on the distribution interval to obtain calculation data;
s25, normalizing the calculated data to obtain corresponding original distribution data;
s26, mapping the calculated data to obtain corresponding quantized data;
in this embodiment, a quantize function is used for mapping;
s27, carrying out normalization processing on the quantized data to obtain corresponding quantized distribution data;
and S28, calculating the difference degree based on the original distribution data and the quantized distribution data to obtain the corresponding quantized difference degree.
In this embodiment, the KL divergence measurement formula is used to calculate the divergence.
S30, determining a corresponding boundary value based on the accumulated interval number corresponding to the minimum quantization difference:
the calculation formula is as follows:
K=(i+0.5)×(t/l);
wherein the content of the first and second substances,ithe number of accumulated intervals corresponding to the minimum quantitative difference degree;
tindicating the span value of the target interval, when the target interval is 0, T min ]when the utility model is used, the water is discharged,t=T min when the target interval is [ -T min ,T max ]When the temperature of the water is higher than the set temperature,t= T max -(- T min )= T max +T min ;
lthe number of the predetermined sub-intervals is equal to 2048, so that the target interval is divided intol= 2048。
S40, generating a corresponding first boundary or a second boundary based on the boundary value;
namely:
when the target interval is a first value interval, generating a first boundary based on the obtained boundary valueα;
When the target interval is a second value interval, a second boundary is generated based on the obtained boundary valueβ。
The boundary pairs corresponding to the network units are regenerated through the steps to realizer min ≤α≤β≤r max Relative to the boundary interval of example 1r min ,r max ]The asymmetric boundary section [2 ] obtained in this embodimentα,β]The range is more consistent with actual data distribution, the influence of outliers on hardware deployment quantification can be solved, and quantification results are further optimized.
Case (2):
and (3) building an AOD-Net defogging model by using a pytorch deep learning library and a python language, training and testing on a public RESIDE data set, wherein the model parameter type of the defogging model obtained by training is a single precision floating point (float).
Dividing the defogging model by taking the convolutional layers as dividing nodes to obtain 5 network units, wherein each network unit is provided with one convolutional layer;
make statistics of eachThe single-precision floating point parameters in the network elements are calculated according to the steps S10-S40 in the embodiment 2αAnd a second boundaryβThe results are shown in the following table:
TABLE 1
Network unit
|
Minimum parameterr min |
Maximum parameterr max |
First boundaryα |
Second boundaryβ |
Layer 1
|
-0.0815
|
0.4654
|
-0.0810
|
0.3100
|
Layer2
|
-0.0244
|
0.0734
|
-0.0100
|
0.0500
|
Layer3
|
-0.0316
|
0.0868
|
-0.0300
|
0.0800
|
Layer4
|
-0.0248
|
0.0736
|
-0.0130
|
0.0600
|
Layer5
|
-0.0181
|
0.0754
|
-0.0041
|
0.0590 |
Based on the first boundaryαAnd a second boundaryβ,According to the step S200, the quantization parameter data and the dequantization parameter data corresponding to 2bit, 3 bit, 4bit, 5bit, 6bit, 7bit and 8bit are compared with each other to compare the defogging effect and the pruning rate corresponding to each quantization bit number.
And (3) defogging effect comparison:
generating a defogging test model corresponding to each quantization bit number based on the inverse quantization parameter data;
and acquiring a fog-containing image and a reference defogging image, wherein the reference defogging image is the defogging image obtained after the defogging model processes the fog-containing image.
And inputting the fog-containing images into the defogging test models respectively to obtain corresponding defogging test images.
In the case, an image mean value, an image standard deviation, an average gradient, a Peak Signal to Noise Ratio (PSNR) and a Structural Similarity (SSIM) are selected to form a combined evaluation index, wherein the image mean value, the image standard deviation and the average gradient are evaluation indexes aiming at the image quality of a defogging test chart, and the PSNR and the SSIM are evaluation indexes obtained by comparing a reference defogging image and the defogging test chart;
based on the effect of bit number on the defogging effect quantified by the joint evaluation index, in this case, a mean square expression is taken for a preset evaluation data set (50 defogging pictures), the statistical result is shown in fig. 2, and the scores of SSIM in fig. 2 are 0.99, 0.98, 0.95 and 0.75 from left to right in sequence.
Referring to fig. 4, in the present embodiment, when the quantization bit number is 4 bits or more, the data fluctuation is small, which indicates that the influence of quantization on defogging is small; when the equivalent bit number is less than 4 bits, the data fluctuation is large, which also proves the subjective visual judgment from the side. Therefore, by integrating subjective and objective evaluation, the invention can realize the quantification of the minimum 4 bits and can realize the purpose of saving hardware resources.
Note that the minimum parameters in Table 1r min Maximum parameterr max As the boundary data, the optimal quantization bit number determined by the above method is 5 bits, that is, the scheme disclosed in embodiment 1 can realize quantization of 5 bits at the minimum.
Pruning rate comparison:
zero-value parameter statistics is performed based on the quantization parameter data corresponding to each quantization bit number to obtain zero-rate and zero-value distribution data, the zero-rate corresponding to 4 bits in this case is 59.06%, and after core-level and channel-level structured pruning is performed based on the zero-value distribution data, the structure and parameter changes are as shown in the following table:
TABLE 2
Pruning of objects
|
Non-pruning structure
|
Pruning rear structure
|
Quantity of parameters to be cut
|
Conv1
|
3×3
|
3×3
|
0
|
Conv2
|
3×3×3×3
|
1×3×3×3
|
54
|
Conv3
|
3×6×5×5
|
2×4×5×5
|
250
|
Conv4
|
3×6×7×7
|
2×3×7×7
|
588
|
Conv5
|
3×12×3×3
|
3×8×3×3
|
108 |
From the above, subtracting 1000 zero values in total, the pruning rate is about 56.8% (= 1000/1761), and about 31.3 Kbits (= 1000 × 32 bit/1024) of storage resources of the model parameters can be directly saved.
Embodiment 3, a quantitative pruning system for a defogging model, as shown in fig. 3, includes:
the preprocessing module 100 is configured to obtain original parameter data of a defogging model, and determine boundary data based on the original parameter data;
an image acquisition module 200, configured to acquire a fog-containing image;
a configuration module 300, configured to obtain a number of quantization bits configured in advance;
a quantization pruning module 400, configured to determine an optimal quantization bit number based on the boundary data and the original parameter data, and perform quantization pruning on the defogging model based on the optimal quantization bit number;
referring to fig. 4, the quantization pruning module 400 includes:
a quantization inverse quantization unit 410, configured to calculate quantization parameter data under a corresponding quantization bit number based on the boundary data and the original parameter data, and perform inverse quantization calculation on the quantization parameter data to obtain corresponding inverse quantization parameter data;
the effect detection unit 420 is configured to generate a corresponding defogging test model based on the inverse quantization parameter data, and perform a defogging effect test on the defogging test model by using the image containing fog to obtain a corresponding test result;
and a quantization pruning unit 430, configured to determine an optimal quantization bit number based on the obtained test result, extract quantization parameter data corresponding to the optimal quantization bit number, and perform quantization pruning on the defogging model based on the quantization parameter data.
Further, the preprocessing module includes:
the partitioning unit is used for partitioning the defogging model by taking the convolutional layer as a partitioning node to obtain a plurality of network units;
the statistical unit is used for acquiring single-precision floating point parameters corresponding to each network unit and acquiring corresponding unit parameter data;
and the boundary calculation unit is used for determining corresponding boundary pairs based on the parameter data of each unit.
Further:
the quantization and inverse quantization unit includes:
a range determination unit configured to determine a maximum quantization value and a minimum quantization value based on a target quantization bit number;
the scale factor calculation unit is used for calculating based on the boundary data, the maximum quantization value and the minimum quantization value to obtain corresponding scale factor data;
the zero point calculation unit is used for calculating based on the boundary data, the maximum quantization value and the scale factor data to obtain corresponding zero point data;
the quantization unit is used for performing quantization translation on the original parameter data based on the scale factor data and the zero point data to obtain corresponding quantization parameter data;
and the inverse quantization unit is used for carrying out inverse quantization on the quantization parameter data to obtain corresponding inverse quantization parameter data.
Further:
the image acquisition module is further used for inputting the fog-containing image into the defogging model to acquire a corresponding defogging reference image;
the effect detection unit includes:
the image generation unit is used for generating a corresponding defogging test model based on the inverse quantization parameter data, inputting the defogged image into the corresponding defogging test model, and outputting a corresponding defogging test image by the defogging test model;
and the evaluation unit is used for carrying out image quality evaluation on the defogging test chart and/or carrying out image quality evaluation on the defogging test chart based on the defogging reference chart to obtain a corresponding test result.
Further:
the quantization pruning unit comprises:
a zero value statistic unit, configured to perform statistics on a zero value parameter based on the quantization parameter data to obtain zero value distribution data and a zero rate, where the zero rate is used to indicate a duty ratio of the zero value parameter;
a processing unit to:
when the zero rate is higher than a preset zero rate threshold value, quantizing the defogging model based on the quantized parameter data and the zero value distribution data, performing core-level and channel-level structured pruning, and outputting the obtained quantized model;
otherwise, quantizing the defogging model based on the quantization parameter data, and outputting the obtained quantization model.
Further:
the boundary calculation unit includes:
an interval determining unit, configured to determine a first value interval and a second value interval based on unit parameter data of a current network unit, where the first value interval is used to indicate a value range of a first boundary, and the second value interval is used to indicate a value range of a second boundary;
the iterative calculation unit is used for dividing a plurality of distribution intervals based on the target interval, and performing iterative calculation based on the distribution intervals and the unit parameter data to obtain the quantization difference degrees corresponding to the unit parameter data in each distribution interval, wherein the distribution intervals all fall in the target interval, the starting points of the distribution intervals are the same, and the target interval is a first value-taking interval or a second value-taking interval;
and the boundary value generating unit is used for determining a corresponding boundary value based on the distribution interval corresponding to the minimum quantization difference degree.
Further, the iterative computation unit includes:
the distribution determining unit is used for determining a distribution interval corresponding to the current iteration process;
the data extraction unit is used for extracting single-precision floating point parameter data adopted in the current iteration process from the unit parameter data to obtain calculation data;
the mapping unit is used for mapping the calculation data to obtain corresponding quantized data;
the normalization unit is used for performing normalization processing on the calculation data to obtain corresponding original distribution data and is also used for performing normalization processing on the quantization data to obtain corresponding quantization distribution data;
and the difference degree calculation unit is used for calculating the difference degree based on the original distribution data and the quantized distribution data to obtain the corresponding quantized difference degree.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
while preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
In addition, it should be noted that the specific embodiments described in the present specification may differ in the shape of the components, the names of the components, and the like. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.