CN111027684A

CN111027684A - Deep learning model quantification method and device, electronic equipment and storage medium

Info

Publication number: CN111027684A
Application number: CN201911256850.7A
Authority: CN
Inventors: 屈伟; 董峰
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-04-17

Abstract

The embodiment of the invention provides a deep learning model quantization method, a deep learning model quantization device, electronic equipment and a storage medium, wherein the method comprises the following steps: and quantizing each convolution layer by using each group of preset quantization thresholds respectively, calculating the relative entropy of the initial activation value and each quantized activation value of the convolution layer respectively to obtain N relative entropies of the convolution layer, selecting the smallest relative entropy from the N relative entropies of the convolution layer as the smallest relative entropy of the convolution layer, selecting the quantized convolution layer to be selected for replacement from the quantized convolution layers to be selected of a plurality of convolution layers, replacing the corresponding convolution layer by using the selected quantized convolution layer to be selected to obtain a quantized depth learning model, wherein the quantization error of the quantized depth learning model is smaller than the preset precision loss threshold. By adopting the scheme provided by the embodiment of the invention, the deep learning is quantized, and the precision loss of the quantized deep learning model is reduced.

Description

Deep learning model quantification method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a deep learning model quantization method and device, electronic equipment and a storage medium.

Background

At present, deep learning models have been widely used in many industries. However, the operating speed of existing deep learning models is still limited by the computer memory capacity. To address this problem, in order to reduce the computer memory pressure, it is necessary to perform fixed-point quantization on each convolution layer of the deep learning model, and reduce the computer memory pressure by using fewer bits. The existing fixed-point quantization method for the deep learning model mainly converts high-bit floating point type parameters into low-bit integer type parameters in each convolution layer of the deep learning model, for example, 32-bit floating point type parameters are correspondingly converted into 8-bit integer type parameters, and floating point numbers are converted into fixed-point numbers, so that the number of bits of a weight of the convolution layer of the deep learning model is reduced, the storage pressure of a computer is reduced, and the running speed of the deep learning model is further improved.

However, when the conventional fixed-point quantization method for the deep learning model performs fixed-point quantization on the deep learning model to increase the running speed of the deep learning model, the precision of the deep learning model is not considered, and the deep learning model after fixed-point quantization has a relatively large error. Taking the currently known MobileNet _ v1 SSD (lightweight network) as an example, after the deep learning model is quantized in a fixed-point manner by converting floating-point numbers into fixed-point numbers, the precision of the deep learning model is reduced by 1.57%, and the precision loss is large. The accuracy loss of the deep learning model is too large, and the stable application of the deep learning model is not facilitated.

Disclosure of Invention

The embodiment of the invention aims to provide a deep learning model quantization method, which is used for solving the problem of high precision loss in the process of quantizing a deep learning model.

In order to achieve the above object, an embodiment of the present invention provides a deep learning model quantization method, where the deep learning model includes a plurality of convolutional layers to be quantized, each convolutional layer corresponds to N sets of preset quantization thresholds, and an activation value of each convolutional layer is used as an initial activation value, the method includes:

for each convolutional layer, quantizing the convolutional layer by using each group of preset quantization thresholds respectively to obtain N quantized convolutional layers and N quantized activation values of the convolutional layer, wherein the N groups of preset quantization thresholds are in one-to-one correspondence with the N quantized convolutional layers, and the N groups of preset quantization thresholds are in one-to-one correspondence with the N quantized activation values;

respectively calculating the relative entropy of the initial activation value and each quantized activation value of each convolutional layer to obtain N relative entropies of the convolutional layer, wherein N groups of preset quantization threshold values are in one-to-one correspondence with the N relative entropies;

selecting the smallest relative entropy of the N relative entropies of the convolutional layers as the smallest relative entropy of the convolutional layers, wherein the quantized convolutional layer corresponding to the preset quantization threshold corresponding to the smallest relative entropy is used as the quantized convolutional layer to be selected of the convolutional layers;

and selecting a to-be-selected quantized convolutional layer for replacement from the to-be-selected quantized convolutional layers of the plurality of convolutional layers, and replacing the corresponding convolutional layer by using the selected to-be-selected quantized convolutional layer to obtain a quantized deep learning model.

Further, the quantized deep learning model satisfies: and the quantization error of the quantized deep learning model is smaller than a preset precision loss threshold value.

Further, the selecting a quantized convolutional layer to be selected for replacement from among quantized convolutional layers to be selected of the plurality of convolutional layers, and replacing the corresponding convolutional layer with the selected quantized convolutional layer to be selected to obtain a quantized deep learning model includes:

sequencing the quantized convolutional layers to be selected of the plurality of convolutional layers according to the sequence of the minimum relative entropy of the plurality of convolutional layers from small to large;

selecting the first n quantized convolution layers to be selected and the first n plus 1 quantized convolution layers to be selected in the sequencing result, replacing the corresponding convolution layers, and respectively obtaining a first quantized deep learning model and a second quantized deep learning model;

and when the quantization error of the first quantized deep learning model is smaller than a preset precision loss threshold value and the quantization error of the second quantized deep learning model is not smaller than the preset precision loss threshold value, selecting the first quantized deep learning model as the quantized deep learning model.

sequencing the quantized convolutional layers to be selected of the plurality of convolutional layers according to the sequence of the minimum relative entropy of the plurality of convolutional layers from large to small;

grouping the quantized convolutional layers to be selected according to the sorting result, wherein the mth group comprises: removing the remaining post-quantization convolutional layers to be selected after the first m post-quantization convolutional layers to be selected in the sequencing result;

replacing the corresponding convolutional layers with the groups of quantized convolutional layers to be selected, and respectively obtaining quantized deep learning models corresponding to the groups;

and when the quantization error of the quantized deep learning model corresponding to the mth minus 1 group is not less than the preset precision loss threshold value and the quantization error of the quantized deep learning model corresponding to the mth group is less than the preset precision loss threshold value, selecting the quantized deep learning model corresponding to the mth group as the quantized deep learning model.

Further, the relative entropy is calculated using the following formula:

wherein D is_KL(P | Q) represents the relative entropy, D_KL(P | Q) is used to describe the degree of difference between P (X) and Q (X), X representing the quantization threshold, X representing the range of the quantization threshold, P (X) representing the distribution of activation values before quantization, and Q (X) representing the corresponding distribution of activation values after quantization with X as the threshold.

Further, the quantization error of the quantized deep learning model is calculated by adopting the following steps:

and aiming at the deep learning model, inputting the data verification set into the deep learning model, executing preset times of model reasoning, and taking the obtained model reasoning time as the original time precision, wherein the data verification set comprises: a preset amount of verification data;

inputting the data verification set into the quantized deep learning model aiming at the quantized deep learning model, executing preset times of model reasoning, and taking the obtained model reasoning time as the quantization time precision;

and calculating the difference value of the original time length precision and the quantized time length precision to be used as the quantization error of the quantized deep learning model.

Further, the deep learning model is obtained by converting the original deep learning model based on a high-performance neural network inference engine TensrT; alternatively, the first and second electrodes may be,

the deep learning model is obtained by converting an original deep learning model based on open visual reasoning and a neural network optimization tool OpenVINO.

In order to achieve the above object, an embodiment of the present invention further provides a deep learning model quantization apparatus, where the deep learning model includes a plurality of convolutional layers to be quantized, each convolutional layer corresponds to N sets of preset quantization thresholds, and an activation value of each convolutional layer is used as an initial activation value, the apparatus includes:

a quantization module, configured to quantize, for each convolutional layer, the convolutional layer by using each set of preset quantization threshold, respectively, to obtain N quantized convolutional layers and N quantized post-activation values of the convolutional layer, where N sets of preset quantization thresholds are in one-to-one correspondence with the N quantized convolutional layers, and N sets of preset quantization thresholds are in one-to-one correspondence with the N quantized post-activation values;

the first calculation module is used for respectively calculating the relative entropy of the initial activation value and each quantized activation value of each convolution layer to obtain N relative entropies of the convolution layer, and N groups of preset quantization threshold values are in one-to-one correspondence with the N relative entropies;

a selecting module, configured to select, for each convolution layer, a smallest relative entropy of the N relative entropies of the convolution layer as a smallest relative entropy of the convolution layer, where a quantized convolution layer corresponding to a preset quantization threshold corresponding to the smallest relative entropy is used as a to-be-selected quantized convolution layer of the convolution layer;

and the replacing module is used for selecting a to-be-selected quantized convolutional layer for replacement from the to-be-selected quantized convolutional layers of the plurality of convolutional layers, and replacing the corresponding convolutional layer by using the selected to-be-selected quantized convolutional layer to obtain a quantized deep learning model.

Further, the replacement module includes:

the first sequencing submodule is used for sequencing the quantized convolutional layers to be selected of the plurality of convolutional layers according to the sequence that the minimum relative entropy of the plurality of convolutional layers is from small to large;

the first selection submodule is used for selecting the first n quantized convolution layers to be selected and the first n plus 1 quantized convolution layers to be selected in the sequencing result, replacing the corresponding convolution layers and respectively obtaining a first quantized deep learning model and a second quantized deep learning model;

and the second selection submodule is used for selecting the first quantized deep learning model as the quantized deep learning model when the quantization error of the first quantized deep learning model is smaller than a preset precision loss threshold and the quantization error of the second quantized deep learning model is not smaller than the preset precision loss threshold.

Further, the replacement module includes:

the second ordering submodule is used for ordering the quantized convolutional layers to be selected of the plurality of convolutional layers according to the sequence of the minimum relative entropy of the plurality of convolutional layers from large to small;

a grouping submodule, configured to group the quantized convolutional layers to be selected according to a sorting result, where the mth group includes: removing the remaining post-quantization convolutional layers to be selected after the first m post-quantization convolutional layers to be selected in the sequencing result;

the first replacement submodule is used for replacing the corresponding convolutional layer with each group of quantized convolutional layers to be selected to respectively obtain a quantized deep learning model corresponding to each group;

and the third selection submodule is used for selecting the quantized deep learning model corresponding to the mth group as the quantized deep learning model when the quantization error of the quantized deep learning model corresponding to the mth minus 1 group is not less than the preset precision loss threshold and the quantization error of the quantized deep learning model corresponding to the mth group is less than the preset precision loss threshold.

Further, the first calculating module is specifically configured to calculate the relative entropy by using the following formula:

Further, the apparatus further includes:

the second calculation module is used for calculating the quantization error of the quantized deep learning model by adopting the following steps:

In order to achieve the above object, an embodiment of the present invention provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface are configured to complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any one of the deep learning model quantization method steps when executing the program stored in the memory.

In order to achieve the above object, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the deep learning model quantization method steps described above.

In order to achieve the above object, an embodiment of the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any of the above-mentioned deep learning model quantization method steps.

The embodiment of the invention has the following beneficial effects:

the deep learning model quantization method provided by the embodiment of the present invention quantizes, for each convolutional layer, the convolutional layer by using each set of preset quantization thresholds, so as to obtain N quantized convolutional layers and N quantized post-activation values of the convolutional layer, respectively calculate the relative entropy of the initial activation value of the convolutional layer and each quantized post-activation value, so as to obtain N relative entropies of the convolutional layer, select the smallest relative entropy of the N relative entropies of the convolutional layer, as the smallest relative entropy of the convolutional layer, use the quantized convolutional layer corresponding to the preset quantization threshold corresponding to the smallest relative entropy as the quantized post-convolution layer to be selected of the convolutional layer, from the quantized post-convolution layers to be selected of the convolutional layers, and selecting the convolution layer to be selected for quantization to replace the corresponding convolution layer, and obtaining a quantized deep learning model with the quantization error smaller than a preset precision loss threshold value. By adopting the method provided by the embodiment of the invention, part or all of the quantized convolutional layers to be selected are selected from the quantized convolutional layers to be selected of the plurality of convolutional layers to replace the corresponding convolutional layers, so that the obtained quantized deep learning model meets the following requirements: and the quantization error of the deep learning model after quantization is smaller than a preset precision loss threshold value. By calculating the quantization error of the quantized deep learning model, the quantization error of the quantized deep learning model is smaller than the preset precision loss threshold, so that the precision loss of the quantized deep learning model is reduced while the deep learning is quantized, and the quantization error of the quantized deep learning model can meet the application requirement.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a first flowchart of a deep learning model quantization method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a deep learning model quantization method according to an embodiment of the present invention;

FIG. 3 is a third flowchart of a deep learning model quantization method according to an embodiment of the present invention;

fig. 4 is a first structural diagram of a deep learning model quantization apparatus according to an embodiment of the present invention;

fig. 5 is a structural diagram of a second deep learning model quantization apparatus according to an embodiment of the present invention;

fig. 6 is a third structural diagram of a deep learning model quantization apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 8a is a schematic diagram illustrating a linear quantization principle of a deep learning model according to an embodiment of the present invention;

fig. 8b is a schematic diagram of a nonlinear quantization principle of the deep learning model according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

Because of the problem of large precision loss when the deep learning model is quantized in the prior art, to solve the technical problem, an embodiment of the present invention provides a deep learning model quantization method, where the deep learning model includes a plurality of convolutional layers to be quantized, each convolutional layer corresponds to N sets of preset quantization thresholds, and an activation value of each convolutional layer is used as an initial activation value, as shown in fig. 1, the method includes:

step 101, quantizing each convolutional layer by using each group of preset quantization thresholds respectively to obtain N quantized convolutional layers and N quantized activation values of the convolutional layer, where N groups of preset quantization thresholds are in one-to-one correspondence with the N quantized convolutional layers, and N groups of preset quantization thresholds are in one-to-one correspondence with the N quantized activation values.

Step 102, for each convolution layer, calculating the relative entropy of the initial activation value and each quantized activation value of the convolution layer respectively to obtain N relative entropies of the convolution layer, wherein N groups of preset quantization threshold values are in one-to-one correspondence with the N relative entropies.

Step 103, selecting the smallest relative entropy of the N relative entropies of the convolutional layer as the smallest relative entropy of the convolutional layer, and selecting the quantized convolutional layer corresponding to the preset quantization threshold corresponding to the smallest relative entropy as the quantized convolutional layer to be selected of the convolutional layer.

And 104, selecting a quantized convolutional layer to be selected for replacement from quantized convolutional layers to be selected of the plurality of convolutional layers, and replacing the corresponding convolutional layer by using the selected quantized convolutional layer to be selected to obtain a quantized deep learning model.

By adopting the method provided by the embodiment of the invention, part or all of the quantized convolutional layers to be selected are selected from the quantized convolutional layers to be selected of the plurality of convolutional layers to replace the corresponding convolutional layers, so that the obtained quantized deep learning model meets the following requirements: and the quantization error of the deep learning model after quantization is smaller than a preset precision loss threshold value. By calculating the quantization error of the quantized deep learning model, the quantization error of the quantized deep learning model is smaller than the preset precision loss threshold, so that the precision loss of the quantized deep learning model is reduced while the deep learning is quantized, and the quantization error of the quantized deep learning model can meet the application requirement.

Further, in the deep learning model quantization method shown in fig. 1, the plurality of convolutional layers to be quantized may be all convolutional layers included in the deep learning model, or may be partial convolutional layers pre-selected based on actual needs, for example, a convolutional layer with a relatively long operation time is selected as the convolutional layer to be quantized.

In the embodiment of the invention, the values of N are as follows: n is 1, 2, ….

The deep learning model quantization method and apparatus provided by the embodiments of the present invention are described in detail below with specific embodiments.

The embodiment of the invention discloses a deep learning model quantization method, wherein a deep learning model comprises a plurality of convolution layers to be quantized, each convolution layer corresponds to N groups of preset quantization threshold values, and the activation value of each convolution layer is used as an initial activation value, wherein the N groups of preset quantization threshold values corresponding to each convolution layer can be specifically set according to the maximum value of a floating point type parameter of the convolution layer.

The deep learning model quantization method disclosed by the embodiment of the invention, as shown in fig. 2, may include the following steps:

step 201, quantizing each convolutional layer by using each group of preset quantization thresholds respectively to obtain N quantized convolutional layers and N quantized activation values of the convolutional layer, where N groups of preset quantization thresholds are in one-to-one correspondence with the N quantized convolutional layers, and N groups of preset quantization thresholds are in one-to-one correspondence with the N quantized activation values.

In this step, for each convolution layer, when the positive and negative distributions of the floating point type parameters of the convolution layer are uniform, the maximum value of the absolute values of the floating point type parameters of the convolution layer may be selected as the preset quantization threshold of the convolution layer, that is, the convolution layer corresponds to 1 set of preset quantization threshold; when the positive and negative distributions of the floating point type parameters of the convolutional layer are not uniform, N groups of preset quantization thresholds can be selected and preset as the preset quantization thresholds of the convolutional layer, and the N value can be specifically set according to the non-uniform degree of the positive and negative distributions of the floating point type parameters of the convolutional layer.

In this step, for each convolution layer, quantizing the convolution layer by using each set of preset quantization threshold, which may specifically be:

and aiming at each convolution layer, establishing a mapping relation between the floating point type parameters and the fixed point number of the convolution layer by using each group of preset quantization threshold values, and mapping the floating point type parameters in the range of the preset quantization threshold values of the convolution layer into the fixed point number based on each group of preset quantization threshold values. The quantization method for the convolutional layer may be, specifically, a linear quantization method or a nonlinear quantization method.

For example, as shown in fig. 8a, the principle of linear quantization of convolutional layer a is shown, where, - | max | and | max | represent preset quantization thresholds, data represented by black solid dots within the range of (| max | ) represents floating-point type parameters of convolutional layer a, and data represented by black solid dots within the range of (-127, 127) represents fixed-point integer type data mapped corresponding to the floating-point type parameters of convolutional layer a. As shown in fig. 8a, the positive and negative distributions of the floating-point type parameter of convolutional layer a are uniform, the maximum value | max |, among the absolute values of the floating-point type parameter of convolutional layer a, can be selected as the preset quantization threshold of convolutional layer a, and further, the convolutional layer a can be selected to be linearly quantized:

the specific mapping rule may be: and r is equal to S (q-z), wherein the conversion relation between the quantization fixed point number and the floating point number is represented, wherein r represents an 8-bit fixed point number value, q represents a quantization value, S represents a scaling factor of a floating point type parameter, and z represents a fixed point integer corresponding to a real number 0. As shown in fig. 8a, 32-bit floating-point parameters in the range of the preset quantization thresholds (— max |, | max |) in convolutional layer a may be mapped to fixed-point numbers in the range of 8 bits.

As shown in fig. 8B, a principle of linear quantization of the convolutional layer B is shown, where-T | and | T | represent preset quantization thresholds, data represented by solid black dots in the range of (- | T |, | T |) represents floating-point type parameters in the range of the preset quantization thresholds of the convolutional layer B, data represented by open circles outside the range of (- | T |, | T |) represents floating-point type parameters outside the range of the preset quantization thresholds of the convolutional layer B, and data represented by solid black dots in the range of (-127, 127) represents fixed-point integer type data mapped by the floating-point type parameters of the convolutional layer B. As shown in fig. 8B, the positive and negative distributions of the floating-point type parameter of the convolutional layer B are not uniform, and the maximum value | T | in the absolute values of the floating-point type parameter of the convolutional layer B may be selected as the preset quantization threshold of the convolutional layer B, so that the convolutional layer B may be selected to be linearly quantized:

the specific mapping rule may be:

for a high-order floating-point parameter, it can be non-linearly quantized, and the high-order floating-point parameter is mapped to a low-order fixed-point parameter. As shown in fig. 8B, when the range of the 32-bit floating-point type parameter of the convolution layer B is (-T |, | T |), the convolution layer B may be nonlinearly quantized, and the 32-bit floating-point type parameter in the range of (-T |, |) in the convolution layer B is mapped to an 8-bit fixed-point parameter in the range of (-127, 127), as shown in fig. 8B, the floating-point type parameter with the absolute value of a part of the parameter value greater than the threshold T may be truncated, and the truncated part of the floating-point type parameter is uniformly represented by the maximum fixed-point number of-127 after quantization.

Step 202, for each convolution layer, calculating the relative entropy of the initial activation value and each quantized activation value of the convolution layer respectively to obtain N relative entropies of the convolution layer, wherein N groups of preset quantization threshold values are in one-to-one correspondence with the N relative entropies.

In this step, for each convolutional layer, the relative entropy of the initial activation value and each quantized activation value of the convolutional layer can be calculated by using the following formula:

Step 203, aiming at each convolution layer, selecting the smallest relative entropy of the N relative entropies of the convolution layer as the smallest relative entropy of the convolution layer, and taking the quantized convolution layer corresponding to the preset quantization threshold value corresponding to the smallest relative entropy as the quantized convolution layer to be selected of the convolution layer.

In one possible implementation, for example, convolutional layer C may correspond to 5 sets of preset quantization thresholds: l is₁、L₂、L₃、L₄And L₅By step 201: using 5 sets of preset quantization thresholds L₁、L₂、L₃、L₄And L₅Quantizing the convolutional layers C, respectively, to obtain 5 quantized convolutional layers and 5 quantized activation values of the convolutional layers C, and continuing through step 202: respectively calculating the relative entropy of the initial activation value and each quantized activation value of the convolutional layer C to obtain 5 relative entropies of the convolutional layer C: d_KL1、D_KL2、D_KL3、D_KL4And D_KL5And there is a one-to-one correspondence between 5 sets of preset quantization thresholds and 5 relative entropies.

By this step, can be selected from D_KL1、D_KL2、D_KL3、D_KL4And D_KL5The smallest relative entropy min { D } is selected_KL1,D_KL2,D_KL3,D_KL4,D_KL5Is taken as the minimum relative entropy of convolutional layer A, and the minimum relative entropy min { D } of convolutional layer A can be selected_KL1,D_KL2,D_KL3,D_KL4,D_KL5The quantized convolution layer corresponding to the preset quantization threshold value corresponding to the predetermined quantization threshold value is used as the to-be-processed convolution layer C of the convolution layer CSelecting the quantized convolutional layer.

In this step, the minimum relative entropy represents the minimum distribution difference between the floating point type parameter of the convolution layer and the fixed point parameter of the quantized convolution layer corresponding to the convolution layer, so that the precision loss of the deep learning model after quantization can be reduced.

And 204, sequencing the quantized convolutional layers to be selected of the plurality of convolutional layers according to the sequence of the minimum relative entropy of the plurality of convolutional layers from small to large.

And step 205, selecting the first n quantized post-convolution layers to be selected and the first n plus 1 quantized post-convolution layers to be selected in the sequencing result, and replacing the corresponding convolution layers to respectively obtain a first quantized deep learning model and a second quantized deep learning model.

In this step, n is 1, 2, …. According to the sorting result of step 204, for example, the first 1 to-be-selected quantized convolution layers and the first 2 to-be-selected quantized convolution layers in the sorting result may be selected to replace the corresponding convolution layers, so as to obtain a first quantized deep learning model and a second quantized deep learning model, respectively.

Step 206, judging whether: if the quantization error of the first quantized deep learning model is smaller than the preset precision loss threshold, and the quantization error of the second quantized deep learning model is not smaller than the preset precision loss threshold, if the determination result is yes, step 207a is executed, and if the determination result is no, step 207b is executed.

In this step, the preset precision loss threshold may be specifically set according to the precision requirement of the actual application.

The quantization error of the quantized deep learning model may specifically be calculated in the following manner:

aiming at the deep learning model, inputting a data verification set into the deep learning model, executing preset times of model reasoning, and taking the obtained model reasoning time as the original time precision, wherein the data verification set comprises: a preset amount of verification data;

The specific calculation process of the quantization error of the quantized deep learning model may be a process as follows:

a data verification set of the deep learning model can be obtained firstly; then, inputting the data verification set into the deep learning model aiming at the deep learning model, executing preset times of model reasoning, and taking the obtained model reasoning time as the original time precision; inputting the data verification set into the quantized deep learning model aiming at the quantized deep learning model, executing preset times of model reasoning, and taking the obtained model reasoning time as the quantization time precision; and calculating the difference value of the original time length precision and the quantized time length precision as the quantization error of the quantized deep learning model, wherein the preset number and the preset times can be specifically set according to the practical application. For example, for the deep learning model DNN-1, 1 million images in the image verification set of the deep learning model DNN-1 and the category information corresponding to the images may be acquired as the data verification set; further, aiming at the deep learning model, inputting the data verification set into the deep learning model, optionally executing 1 thousand times of model inference, and taking the obtained model inference time length as the original time length precision A1; for the quantized deep learning model, the data verification set can be input into the quantized deep learning model, 1 thousand times of model inference is executed in the same way, and the obtained model inference time length is used as quantization time length precision A2; the difference between the original time length precision a1 and the quantized time length precision a2 is calculated, and the difference (a1-a2) can be used as the quantization error of the quantized deep learning model.

And step 207a, selecting the first quantized deep learning model as a quantized deep learning model.

And step 207b, adding 1 to the value of n in the step 205 to obtain a value serving as a new value of n, and continuing to return to execute the step 205.

By adopting the method provided by the embodiment of the invention, the convolution layer to be selected and quantized of a plurality of convolution layers is obtained, the method comprises the steps of sequencing the to-be-selected quantized convolutional layers of the plurality of convolutional layers, selecting the to-be-selected quantized convolutional layers in a sequencing result to replace the corresponding convolutional layers to obtain a first quantized deep learning model meeting the condition that a quantization error is smaller than a preset precision loss threshold, and a second quantized deep learning model with a quantization error not less than a preset precision loss threshold, and further selecting the first quantized deep learning model as a quantized deep learning model, while the quantization error of the quantized deep learning model meets the preset precision loss threshold requirement, by reserving more quantized convolution layers, the deep learning model is quantized to a greater degree, and furthermore, the running speed of the deep learning model is improved while the accuracy requirement is met.

In another embodiment of the present invention, as shown in fig. 3, a deep learning model quantization method provided in an embodiment of the present invention may include the following steps:

steps 301 to 303 are respectively the same as steps 201 to 203, and are not described again here.

And 304, sequencing the quantized convolutional layers to be selected of the plurality of convolutional layers according to the sequence of the minimum relative entropy of the plurality of convolutional layers from large to small.

Step 305, grouping the quantized convolutional layers to be selected according to the sorting result, and replacing the quantized convolutional layers to be selected with corresponding convolutional layers to respectively obtain quantized deep learning models corresponding to each group, wherein the mth group comprises: and removing the remaining convolution layers to be selected after the first m convolution layers to be selected and quantized in the sequencing result.

In this step, m is 1, 2, …. According to the sorting result of step 304, selecting the remaining quantized convolution layer to be selected after removing the first 1 quantized convolution layer to be selected from the sorting result as a first group of quantized convolution layers to be selected, and replacing the quantized convolution layer to be selected in the first group with the corresponding convolution layer to obtain a first group of corresponding quantized deep learning model; according to the rule, the mth minus 1 group of quantized convolutional layers to be selected and the mth minus 1 group of quantized deep learning models corresponding to the mth group of quantized convolutional layers to be selected and the mth group of quantized deep learning models corresponding to the mth group of quantized deep learning models can be obtained.

Step 306, judging whether the following conditions are met: and if the determination result is yes, executing step 307a, and if the determination result is no, executing step 307 b.

The specific calculation manner of the quantization error of the quantized deep learning model is as in step 206, which is not described herein again.

And 307a, selecting the quantized deep learning model corresponding to the mth group as the quantized deep learning model.

And 307b, adding 1 to the value of m in the step 305 to obtain a value serving as a new value of m, and continuing to return to execute the step 306 according to the quantized deep learning models corresponding to the new value of m and the value of m minus 1.

By adopting the method provided by the embodiment of the invention, the convolution layer to be selected and quantized of a plurality of convolution layers is obtained, sorting the to-be-selected quantized convolutional layers of the plurality of convolutional layers, selecting the to-be-selected quantized convolutional layers in the sorting result to replace the corresponding convolutional layers to obtain an m minus 1 quantized deep learning model meeting the requirement that the quantization error is not less than a preset precision loss threshold, and the m group of quantized deep learning models with quantization errors smaller than a preset precision loss threshold value are selected as quantized deep learning models, while the quantization error of the quantized deep learning model meets the preset precision loss threshold requirement, by reserving more quantized convolution layers, the deep learning model is quantized to a greater degree, and furthermore, the running speed of the deep learning model is improved while the accuracy requirement is met.

The Deep learning model in the embodiment of the present invention may specifically be DNN (Deep Neural Network), and specifically may include: CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short Term Memory Network).

The deep learning model in the embodiment of the invention can be used in various fields such as the field of image processing, the technical field of voice recognition and the like.

The deep learning model in the embodiment of the invention can be specifically used in the field of image processing:

and (3) target classification: object classification is an object recognition problem based on a classification task, i.e. a computer finds out from given data which of these data are the desired objects. For example, cat and dog classifications or floral classifications;

target detection: the target detection can determine the specific position of a target to be detected from a current image, and the target detection has wide application and is often applied to power system detection, medical image detection and the like;

target segmentation: in the deep learning field, the research direction of target segmentation is mainly divided into semantic segmentation and example segmentation, wherein the semantic segmentation is to classify each pixel point in an image and judge which pixels in the image belong to which target, the example segmentation is to judge which pixels belong to the target, and judge which pixels belong to a first target and a second target, and the key in the medical image is to segment human organs at present.

The deep learning model in the embodiment of the invention can be specifically used in the technical field of speech recognition:

and (3) voice recognition: the speech recognition aims to transmit a natural language to a computer in the form of an acoustic signal, and the natural language is understood and responded by the computer, and the application scenes of the speech recognition can be as follows: the driving navigation software guides roads and broadcasts road conditions for drivers through a voice recognition technology.

In the embodiment of the present invention, when the deep learning model is a model for performing image processing, the verification data included in the data verification set may be a verification image, and when the deep learning model is a model for performing voice processing, the verification data included in the data verification set may be voice data.

In the embodiment of the invention, the deep learning model can be a model obtained by converting an original deep learning model based on TensorRT (high performance neural network inference engine); or, the deep learning model to be searched may also be a model obtained by converting an original deep learning model based on OpenVINO (open visual inference and neural network optimization tool); alternatively, the deep learning model to be searched can also be an original deep learning model which is not subjected to any optimization processing.

In a possible implementation manner, with the scheme provided by the embodiment of the present invention, after a Single Shot multi box Detector (SSD) model is quantized based on a CPU platform, the SSD model and the quantized SSD model are respectively tested, and the test results are shown in table 1:

the model of the tested CPU is 6248, where FP32 represents that the convolution layer of the unquantized SSD model is a 32-bit floating-point parameter, and Int8 represents that the convolution layer of the quantized SSD model is an 8-bit fixed-point integer parameter. As can be seen from table 1, for all cores of the CPU: the test time after quantization was 0.015s, the test time before quantization was 0.0216 s; CPU for a single core: the test time after quantization is 0.3s, and the test time before quantization is 0.5 s; the quantization error of the quantized SSD model was calculated to be 0.3%.

Therefore, by adopting the method provided by the embodiment of the invention, the deep learning model can be quantized to a greater extent while the quantization error of the quantized deep learning model meets the requirement of the preset precision loss threshold, so that the running speed of the deep learning model is increased while the precision requirement is met.

Table 1: running time test result before and after SSD model quantification based on CPU platform

Based on the same inventive concept, according to the method for screening information to be recommended provided in the above embodiment of the present invention, correspondingly, another embodiment of the present invention further provides a deep learning model quantization apparatus, a schematic structural diagram of which is shown in fig. 4, and the method specifically includes:

a quantization module 401, configured to quantize, for each convolutional layer, the convolutional layer by using each set of preset quantization thresholds respectively, so as to obtain N quantized convolutional layers and N quantized active values of the convolutional layer, where N sets of preset quantization thresholds are in one-to-one correspondence with the N quantized convolutional layers, and N sets of preset quantization thresholds are in one-to-one correspondence with the N quantized active values;

a first calculating module 402, configured to calculate, for each convolution layer, a relative entropy between an initial activation value of the convolution layer and each quantized activation value, to obtain N relative entropies of the convolution layer, where N groups of preset quantization thresholds are in one-to-one correspondence with the N relative entropies;

a selecting module 403, configured to select, for each convolution layer, a smallest relative entropy of the N relative entropies of the convolution layer as a smallest relative entropy of the convolution layer, where a quantized convolution layer corresponding to a preset quantization threshold corresponding to the smallest relative entropy is used as a to-be-selected quantized convolution layer of the convolution layer;

a replacing module 404, configured to select a quantized convolutional layer to be selected for replacement from among quantized convolutional layers to be selected of the plurality of convolutional layers, and replace a corresponding convolutional layer with the selected quantized convolutional layer to be selected, so as to obtain a quantized deep learning model.

It can be seen that with the device provided by the embodiment of the present invention, by selecting part or all of quantized convolutional layers to be selected from among quantized convolutional layers to be selected of a plurality of convolutional layers to replace the corresponding convolutional layers, the obtained quantized deep learning model satisfies: and the quantization error of the deep learning model after quantization is smaller than a preset precision loss threshold value. By calculating the quantization error of the quantized deep learning model, the quantization error of the quantized deep learning model is smaller than the preset precision loss threshold, so that the precision loss of the quantized deep learning model is reduced while the deep learning is quantized, and the quantization error of the quantized deep learning model can meet the application requirement.

Further, the quantized deep learning model satisfies the following conditions: and the quantization error of the deep learning model after quantization is smaller than a preset precision loss threshold value.

Further, as shown in fig. 5, the replacing module 404 includes:

a first ordering submodule 501, configured to order, according to a sequence from a minimum relative entropy of the plurality of convolutional layers to a maximum relative entropy, convolutional layers to be selected and quantized of the plurality of convolutional layers;

the first selection submodule 502 is configured to select the first n post-quantization convolutional layers to be selected and the first n plus 1 post-quantization convolutional layers to be selected in the ordering result, replace the corresponding convolutional layers, and obtain a first post-quantization deep learning model and a second post-quantization deep learning model respectively;

the second selecting submodule 503 is configured to select the first quantized deep learning model as the quantized deep learning model when the quantization error of the first quantized deep learning model is smaller than the preset precision loss threshold and the quantization error of the second quantized deep learning model is not smaller than the preset precision loss threshold.

Further, as shown in fig. 6, the replacing module 404 includes:

a second sorting submodule 601, configured to sort, according to a descending order of minimum relative entropy of the plurality of convolutional layers, the convolutional layers to be selected and quantized among the plurality of convolutional layers;

a grouping submodule 602, configured to group the quantized convolutional layers to be selected according to the sorting result, where the mth group includes: removing the remaining post-quantization convolutional layers to be selected after the first m post-quantization convolutional layers to be selected in the sequencing result;

the first replacing submodule 603 is configured to replace the quantized convolutional layers of each group with corresponding convolutional layers to obtain quantized deep learning models corresponding to each group;

the third selecting sub-module 604 is configured to select the quantized deep learning model corresponding to the mth group as the quantized deep learning model when the quantization error of the quantized deep learning model corresponding to the mth minus 1 group is not less than the preset precision loss threshold and the quantization error of the quantized deep learning model corresponding to the mth group is less than the preset precision loss threshold.

Further, the first calculating module 402 is specifically configured to calculate the relative entropy by using the following formula:

Further, as shown in fig. 4, the deep learning model quantization apparatus further includes:

a second calculating module 405, configured to calculate a quantization error of the quantized deep learning model by using the following steps:

Further, the deep learning model is obtained by converting the original deep learning model based on TensorRT; or the deep learning model is obtained by converting the original deep learning model based on OpenVINO.

Therefore, by adopting the device provided by the embodiment of the invention, the to-be-selected quantized convolutional layer of the plurality of convolutional layers is obtained, the to-be-selected quantized convolutional layer of the plurality of convolutional layers is sequenced, and the to-be-selected quantized convolutional layer in the sequencing result is selected to replace the corresponding convolutional layer, so that the obtained quantized deep learning model just meets the condition that the quantization error is smaller than the preset precision loss, and the deep learning model is quantized to a greater extent by reserving more quantized convolutional layers while the quantization error of the quantized deep learning model meets the preset precision loss threshold requirement, thereby improving the running speed of the deep learning model while meeting the precision requirement.

Based on the same inventive concept, according to the method for screening information to be recommended provided by the foregoing embodiment of the present invention, correspondingly, another embodiment of the present invention further provides an electronic device, referring to fig. 7, the electronic device according to the embodiment of the present invention includes a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702 and the memory 703 complete mutual communication through the communication bus 704.

A memory 703 for storing a computer program;

the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:

selecting a to-be-selected quantized convolutional layer for replacement from the to-be-selected quantized convolutional layers of the plurality of convolutional layers, and replacing the corresponding convolutional layer by using the selected to-be-selected quantized convolutional layer to obtain a quantized deep learning model, wherein the quantization error of the quantized deep learning model is smaller than a preset precision loss threshold.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the deep learning model quantization methods described above.

In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform any one of the deep learning model quantization methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, the electronic device and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A deep learning model quantization method is characterized in that the deep learning model comprises a plurality of convolutional layers to be quantized, each convolutional layer corresponds to N sets of preset quantization threshold values, and the activation value of each convolutional layer is used as an initial activation value, the method comprises the following steps:

2. The method of claim 1, wherein the quantized deep learning model satisfies: and the quantization error of the quantized deep learning model is smaller than a preset precision loss threshold value.

3. The method of claim 2, wherein selecting a quantized convolutional layer to be replaced from among quantized convolutional layers to be selected from the plurality of convolutional layers, and replacing the corresponding convolutional layer with the selected quantized convolutional layer to be selected to obtain a quantized deep learning model, comprises:

4. The method of claim 2, wherein selecting a quantized convolutional layer to be replaced from among quantized convolutional layers to be selected from the plurality of convolutional layers, and replacing the corresponding convolutional layer with the selected quantized convolutional layer to be selected to obtain a quantized deep learning model, comprises:

5. The method of claim 1, wherein the relative entropy is calculated using the formula:

6. The method according to any one of claims 1 to 5, wherein the quantization error of the quantized deep learning model is calculated by:

7. The method according to claim 1, wherein the deep learning model is obtained by converting an original deep learning model based on a high-performance neural network inference engine TensrT; alternatively, the first and second electrodes may be,

8. A deep learning model quantization apparatus, wherein the deep learning model includes a plurality of convolutional layers to be quantized, each convolutional layer corresponds to N sets of preset quantization threshold values, and an activation value of each convolutional layer is used as an initial activation value, the apparatus comprising:

9. The apparatus of claim 8, wherein the quantized deep learning model satisfies: and the quantization error of the quantized deep learning model is smaller than a preset precision loss threshold value.

10. The apparatus of claim 9, wherein the replacement module comprises:

11. The apparatus of claim 9, wherein the replacement module comprises:

12. The apparatus according to claim 8, wherein the first calculating module is specifically configured to calculate the relative entropy using the following formula:

13. The apparatus of any of claims 8-12, further comprising:

14. The device according to claim 8, wherein the deep learning model is obtained by converting an original deep learning model based on a high-performance neural network inference engine TensorRT; alternatively, the first and second electrodes may be,

15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

16. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.