CN116306837A

CN116306837A - Adaptive basis function superposition quantization method and system based on different network types

Info

Publication number: CN116306837A
Application number: CN202310172732.8A
Authority: CN
Inventors: 李涛; 熊大鹏; 胡建伟
Original assignee: Suzhou Yizhu Intelligent Technology Co ltd
Current assignee: Suzhou Yizhu Intelligent Technology Co ltd
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-06-23

Abstract

The invention discloses a self-adaptive basis function superposition quantization method and a system based on different network types, belonging to the technical field of neural network model compression, wherein the method comprises the following steps: s1: acquiring a neural network weight; s2: selecting different basis functions to respectively carry out overall quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; s3: repeating the steps until all weight layers of the neural network are quantized, and summarizing the optimal quantization parameter configuration and the quantized weight layers; s4: and outputting the quantized result of the whole neural network. The invention can reduce network precision loss under the condition of keeping high compression ratio, and can be more flexible and more efficient under quantization.

Description

Adaptive basis function superposition quantization method and system based on different network types

Technical Field

The invention relates to the technical field of neural network model compression, in particular to a self-adaptive basis function superposition quantization method and system based on different network types.

Background

With the rapid development of the neural network, the parameter quantity of the neural network model is increased increasingly, so that the energy consumption and the running time of the neural network are increased continuously. For widely available, resource-limited users, it becomes extremely important to model compress existing large models to fit the user's actual situation.

For neural network models, common compression methods include quantization, knowledge distillation, pruning and other schemes. The invention focuses on quantization in a compression scheme, namely, high-precision high-bit-width original data is converted into low-bit-width data through a certain algorithm or a normal form, so that the purpose of reducing the model size is achieved on the premise of keeping the model precision. In current research and production, common quantization methods are uniform quantization and non-uniform quantization, wherein a representative scheme of uniform quantization is INT quantization, and uniformly rounding data to a data interval of fixed pitch; a common approach to non-uniform quantization, however, is to use ADDITIVE power OF quantization (addition power OF-TWO QUANTIZATION, APOT), by quantizing the data to a superposition OF POWERS OF 2, by recording the POWERS OF 2 instead OF the original data to achieve the quantization effect.

For the existing quantization scheme, the network quantization work is carried out mostly by taking 2 as a substrate. However, by researching the existing neural network weight, the distribution of the network weight can be found to have uncertainty, and the weight range which can be adapted by using the quantization scheme with 2 as the base only has great limitation, so that the accuracy after quantization is reduced potentially.

Disclosure of Invention

The invention aims to provide a self-adaptive basis function superposition quantization method and system based on different network types, so as to solve the technical problem of how to realize higher quantized precision under the same quantization bit width.

The invention is realized by adopting the following technical scheme: the adaptive basis function superposition quantization method based on different network types comprises the following steps:

s1: acquiring a neural network weight;

s2: selecting different basis functions to respectively carry out overall quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration;

s3: repeating the steps until all weight layers of the neural network are quantized, and summarizing the optimal quantization parameter configuration and the quantized weight layers;

s4: and outputting the quantized result of the whole neural network.

Further, step S1 includes the following sub-steps:

s11: for any neural network, confirming the network structure of the neural network;

s12: analyzing a given neural network, and distinguishing to obtain each weight layer;

s13: and taking out one layer of network weight as a network layer to be quantified according to the sequence of the network structure.

Further, the basis function includes a polynomial basis, that is: 1, k 2; and different k values are designated at the same time, wherein k is a natural number.

Further, the quantization formula is: scale (b0+b1+b1+b2+k2), where b0, b1 and b2 have values of 0 or 1, scale is a scaling factor globally given in the present weight layer, and there are 8 different quantization modes according to the difference of b0, b1 and b 2.

Further, a quantization parameter with the smallest difference with the original neural network weight is selected, b0, b1 and b2 are recorded as quantized numbers, and high-precision data are quantized into low-bit-width data.

Further, the step S2 specifically includes: and (3) carrying out integral quantization on the weight of the neural network by using a quantization algorithm, finally obtaining a whole layer of quantized weight and precision result, locking the weight of other weight layers in the process, not modifying, replacing the basis function, and carrying out multiple times of quantization to obtain the quantized weight and precision result of different basis functions.

Further, after the quantized weight and precision results of different base functions are obtained, searching is carried out, a base function scheme with the lowest precision loss is selected, and the best quantization parameter configuration is obtained through self-adaptive searching.

Further, step S3 includes the following sub-steps:

s31: checking whether all the weight layers are quantized, if so, entering step S32, otherwise, returning to step S1;

s32: after all the weight layers are quantized, summarizing the quantization parameter configuration recorded in the step S2 and the obtained quantized weight layers.

Further, the step S4 specifically includes: outputting the quantization methods obtained in the steps S1-S3, the quantization precision obtained after the whole neural network is quantized, and outputting the quantization compression ratio.

The adaptive basis function superposition quantization system based on different network types comprises an acquisition module, a quantization module, a judgment module and an output module, wherein the acquisition module is used for acquiring the weight of the neural network; the quantization module is used for selecting different basis functions to respectively carry out integral quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; the judging module is used for repeating the steps until all weight layers of the neural network are quantized, summarizing the optimal quantization parameter configuration and quantized weight layers; the output module is used for outputting the quantized result of the whole neural network.

The invention has the beneficial effects that: according to the invention, the search space is expanded on the existing quantization algorithm, and the neural network can reduce network precision loss under the condition of keeping high compression rate by utilizing the mathematical characteristics of different basis functions, so that the neural network can be more flexible and more efficient under quantization; the present invention follows the existing demand for compression quantization of neural networks and gives a possible direction to how future compression quantization is developed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the present invention;

fig. 2 is a flowchart of a basis function quantization algorithm.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Example 1

Referring to fig. 1, the adaptive basis function superposition quantization method based on different network types performs adaptive basis function superposition quantization for different network structures, and specifically includes the following steps:

and 0, inputting and confirming the network structure of any neural network.

And step 1, analyzing the given neural network, and distinguishing to obtain each weight layer. Depending on the neural network structure, tens or hundreds of layers of neural network weights waiting to be quantized may be obtained.

And 2, taking out one layer of network weight as a network layer to be quantified according to the sequence of the network structure.

And 3, selecting a base function to be used. In this embodiment, a polynomial base is selected, i.e., 1, k 2, while different k is specified, such as k=2, 3,5, ….

And 4, quantizing and searching the network weight selected in the step 2 by using the different basis functions given in the step 3.

In this embodiment, step 4 specifically includes: first, see fig. 2 for a schematic diagram of an adaptive basis function quantization algorithm for data according to a certain basis function: for 32 bits of raw data x as input, given k, the algorithm quantizes it into scale form (b0×1+b1×k+b2×k ζ2), where b0, b1, b2 may be 0 or 1, scale is a scaling factor given globally at the layer. After quantization, according to the difference of b0, b1, b2, it is possible to quantize into 8 different forms, select the quantization parameter that has the smallest difference from the original data, and record b0, b1, b2 as the quantized number. Thus, the present algorithm quantizes high-precision data of 32 bits into low-bit-width data of 3 bits.

And carrying out overall quantization on the network weight by using the quantization algorithm, finally obtaining a layer of quantized weight, and carrying out reasoning to obtain an accuracy result. In the step, the weights of other layers are locked without any modification, then the basis functions are replaced and quantized for a plurality of times, the weights quantized by different basis functions are obtained, and the corresponding precision results are obtained by reasoning. And finally, searching, namely selecting the basis function scheme with the lowest precision loss, and adaptively searching to obtain the optimal quantization parameter configuration.

And 5, recording the optimal quantization parameter configuration obtained in the step 4.

And 6, checking whether all layers are quantized, if yes, performing step 7, otherwise, returning to step 2 to perform quantization of the next layer.

And 7, after all the layers are quantized, summarizing the quantization parameter configuration recorded in the step 5 and the obtained quantized weight layer.

And 8, outputting the quantization method obtained in the step 7, and the quantization precision obtained by reasoning after the whole network is quantized by the method, and outputting information such as quantization compression ratio and the like according to the requirement.

And 9, ending.

Based on the same inventive concept, the invention also provides a self-adaptive basis function superposition quantization system based on different network types, so as to realize the self-adaptive basis function superposition quantization method based on different network types, wherein the system comprises an acquisition module, a quantization module, a judgment module and an output module, wherein the acquisition module is used for acquiring the weight of the neural network; the quantization module is used for selecting different basis functions to respectively carry out integral quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; the judging module is used for repeating the steps until all weight layers of the neural network are quantized, summarizing the optimal quantization parameter configuration and quantized weight layers; the output module is used for outputting the quantized result of the whole neural network.

The invention has at least the following technical effects:

according to the invention, the search space is expanded on the existing quantization algorithm, and the neural network can reduce network precision loss under the condition of keeping high compression rate by utilizing the mathematical characteristics of different basis functions, so that the neural network can be more flexible and more efficient under quantization; the present invention follows the existing demand for compression quantization of neural networks and gives a possible direction to how future compression quantization is developed.

It should be noted that, for simplicity of description, the foregoing embodiments are all described as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts referred to are not necessarily required for the present application.

In the above embodiments, the basic principle and main features of the present invention and advantages of the present invention are described. It will be appreciated by persons skilled in the art that the present invention is not limited by the foregoing embodiments, but rather is shown and described in what is considered to be illustrative of the principles of the invention, and that modifications and changes can be made by those skilled in the art without departing from the spirit and scope of the invention, and therefore, is within the scope of the appended claims.

Claims

1. The adaptive basis function superposition quantization method based on different network types is characterized by comprising the following steps of:

s1: acquiring a neural network weight;

s4: and outputting the quantized result of the whole neural network.

2. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein step S1 comprises the sub-steps of:

3. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein the basis function comprises a polynomial basis, namely: 1, k 2; and different k values are designated at the same time, wherein k is a natural number.

4. The adaptive basis function superposition quantization method based on different network types according to claim 3, wherein the quantization formula is: scale (b0+b1+b1+b2+k2), where b0, b1 and b2 have values of 0 or 1, scale is a scaling factor globally given in the present weight layer, and there are 8 different quantization modes according to the difference of b0, b1 and b 2.

5. The adaptive basis function superposition quantization method based on different network types according to claim 4, wherein a quantization parameter having the smallest difference from the original neural network weight is selected, and b0, b1 and b2 are recorded as quantized numbers to quantize high-precision data into low-bit-width data.

6. The adaptive basis function superposition quantization method based on different network types as recited in claim 1, wherein step S2 is specifically: and (3) carrying out integral quantization on the weight of the neural network by using a quantization algorithm, finally obtaining a whole layer of quantized weight and precision result, locking the weight of other weight layers in the process, not modifying, replacing the basis function, and carrying out multiple times of quantization to obtain the quantized weight and precision result of different basis functions.

7. The method for superposition quantization of adaptive basis functions based on different network types according to claim 6, wherein after obtaining the quantized weights and precision results of different basis functions, searching is performed to select the basis function scheme with the lowest precision loss, and the adaptive searching obtains the optimal quantization parameter configuration.

8. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein step S3 comprises the sub-steps of:

9. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein step S4 is specifically: outputting the quantization methods obtained in the steps S1-S3, the quantization precision obtained after the whole neural network is quantized, and outputting the quantization compression ratio.

10. The adaptive basis function superposition quantization system based on different network types is used for realizing the adaptive basis function superposition quantization method based on different network types according to any one of claims 1-9, and is characterized by comprising an acquisition module, a quantization module, a judgment module and an output module, wherein the acquisition module is used for acquiring the weight of the neural network; the quantization module is used for selecting different basis functions to respectively carry out integral quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; the judging module is used for repeating the steps until all weight layers of the neural network are quantized, summarizing the optimal quantization parameter configuration and quantized weight layers; the output module is used for outputting the quantized result of the whole neural network.