CN116306837A - Adaptive basis function superposition quantization method and system based on different network types - Google Patents

Adaptive basis function superposition quantization method and system based on different network types Download PDF

Info

Publication number
CN116306837A
CN116306837A CN202310172732.8A CN202310172732A CN116306837A CN 116306837 A CN116306837 A CN 116306837A CN 202310172732 A CN202310172732 A CN 202310172732A CN 116306837 A CN116306837 A CN 116306837A
Authority
CN
China
Prior art keywords
quantization
quantized
neural network
weight
basis function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310172732.8A
Other languages
Chinese (zh)
Inventor
李涛
熊大鹏
胡建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Yizhu Intelligent Technology Co ltd
Original Assignee
Suzhou Yizhu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Yizhu Intelligent Technology Co ltd filed Critical Suzhou Yizhu Intelligent Technology Co ltd
Priority to CN202310172732.8A priority Critical patent/CN116306837A/en
Publication of CN116306837A publication Critical patent/CN116306837A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a self-adaptive basis function superposition quantization method and a system based on different network types, belonging to the technical field of neural network model compression, wherein the method comprises the following steps: s1: acquiring a neural network weight; s2: selecting different basis functions to respectively carry out overall quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; s3: repeating the steps until all weight layers of the neural network are quantized, and summarizing the optimal quantization parameter configuration and the quantized weight layers; s4: and outputting the quantized result of the whole neural network. The invention can reduce network precision loss under the condition of keeping high compression ratio, and can be more flexible and more efficient under quantization.

Description

Adaptive basis function superposition quantization method and system based on different network types
Technical Field
The invention relates to the technical field of neural network model compression, in particular to a self-adaptive basis function superposition quantization method and system based on different network types.
Background
With the rapid development of the neural network, the parameter quantity of the neural network model is increased increasingly, so that the energy consumption and the running time of the neural network are increased continuously. For widely available, resource-limited users, it becomes extremely important to model compress existing large models to fit the user's actual situation.
For neural network models, common compression methods include quantization, knowledge distillation, pruning and other schemes. The invention focuses on quantization in a compression scheme, namely, high-precision high-bit-width original data is converted into low-bit-width data through a certain algorithm or a normal form, so that the purpose of reducing the model size is achieved on the premise of keeping the model precision. In current research and production, common quantization methods are uniform quantization and non-uniform quantization, wherein a representative scheme of uniform quantization is INT quantization, and uniformly rounding data to a data interval of fixed pitch; a common approach to non-uniform quantization, however, is to use ADDITIVE power OF quantization (addition power OF-TWO QUANTIZATION, APOT), by quantizing the data to a superposition OF POWERS OF 2, by recording the POWERS OF 2 instead OF the original data to achieve the quantization effect.
For the existing quantization scheme, the network quantization work is carried out mostly by taking 2 as a substrate. However, by researching the existing neural network weight, the distribution of the network weight can be found to have uncertainty, and the weight range which can be adapted by using the quantization scheme with 2 as the base only has great limitation, so that the accuracy after quantization is reduced potentially.
Disclosure of Invention
The invention aims to provide a self-adaptive basis function superposition quantization method and system based on different network types, so as to solve the technical problem of how to realize higher quantized precision under the same quantization bit width.
The invention is realized by adopting the following technical scheme: the adaptive basis function superposition quantization method based on different network types comprises the following steps:
s1: acquiring a neural network weight;
s2: selecting different basis functions to respectively carry out overall quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration;
s3: repeating the steps until all weight layers of the neural network are quantized, and summarizing the optimal quantization parameter configuration and the quantized weight layers;
s4: and outputting the quantized result of the whole neural network.
Further, step S1 includes the following sub-steps:
s11: for any neural network, confirming the network structure of the neural network;
s12: analyzing a given neural network, and distinguishing to obtain each weight layer;
s13: and taking out one layer of network weight as a network layer to be quantified according to the sequence of the network structure.
Further, the basis function includes a polynomial basis, that is: 1, k 2; and different k values are designated at the same time, wherein k is a natural number.
Further, the quantization formula is: scale (b0+b1+b1+b2+k2), where b0, b1 and b2 have values of 0 or 1, scale is a scaling factor globally given in the present weight layer, and there are 8 different quantization modes according to the difference of b0, b1 and b 2.
Further, a quantization parameter with the smallest difference with the original neural network weight is selected, b0, b1 and b2 are recorded as quantized numbers, and high-precision data are quantized into low-bit-width data.
Further, the step S2 specifically includes: and (3) carrying out integral quantization on the weight of the neural network by using a quantization algorithm, finally obtaining a whole layer of quantized weight and precision result, locking the weight of other weight layers in the process, not modifying, replacing the basis function, and carrying out multiple times of quantization to obtain the quantized weight and precision result of different basis functions.
Further, after the quantized weight and precision results of different base functions are obtained, searching is carried out, a base function scheme with the lowest precision loss is selected, and the best quantization parameter configuration is obtained through self-adaptive searching.
Further, step S3 includes the following sub-steps:
s31: checking whether all the weight layers are quantized, if so, entering step S32, otherwise, returning to step S1;
s32: after all the weight layers are quantized, summarizing the quantization parameter configuration recorded in the step S2 and the obtained quantized weight layers.
Further, the step S4 specifically includes: outputting the quantization methods obtained in the steps S1-S3, the quantization precision obtained after the whole neural network is quantized, and outputting the quantization compression ratio.
The adaptive basis function superposition quantization system based on different network types comprises an acquisition module, a quantization module, a judgment module and an output module, wherein the acquisition module is used for acquiring the weight of the neural network; the quantization module is used for selecting different basis functions to respectively carry out integral quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; the judging module is used for repeating the steps until all weight layers of the neural network are quantized, summarizing the optimal quantization parameter configuration and quantized weight layers; the output module is used for outputting the quantized result of the whole neural network.
The invention has the beneficial effects that: according to the invention, the search space is expanded on the existing quantization algorithm, and the neural network can reduce network precision loss under the condition of keeping high compression rate by utilizing the mathematical characteristics of different basis functions, so that the neural network can be more flexible and more efficient under quantization; the present invention follows the existing demand for compression quantization of neural networks and gives a possible direction to how future compression quantization is developed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the present invention;
fig. 2 is a flowchart of a basis function quantization algorithm.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Example 1
Referring to fig. 1, the adaptive basis function superposition quantization method based on different network types performs adaptive basis function superposition quantization for different network structures, and specifically includes the following steps:
and 0, inputting and confirming the network structure of any neural network.
And step 1, analyzing the given neural network, and distinguishing to obtain each weight layer. Depending on the neural network structure, tens or hundreds of layers of neural network weights waiting to be quantized may be obtained.
And 2, taking out one layer of network weight as a network layer to be quantified according to the sequence of the network structure.
And 3, selecting a base function to be used. In this embodiment, a polynomial base is selected, i.e., 1, k 2, while different k is specified, such as k=2, 3,5, ….
And 4, quantizing and searching the network weight selected in the step 2 by using the different basis functions given in the step 3.
In this embodiment, step 4 specifically includes: first, see fig. 2 for a schematic diagram of an adaptive basis function quantization algorithm for data according to a certain basis function: for 32 bits of raw data x as input, given k, the algorithm quantizes it into scale form (b0×1+b1×k+b2×k ζ2), where b0, b1, b2 may be 0 or 1, scale is a scaling factor given globally at the layer. After quantization, according to the difference of b0, b1, b2, it is possible to quantize into 8 different forms, select the quantization parameter that has the smallest difference from the original data, and record b0, b1, b2 as the quantized number. Thus, the present algorithm quantizes high-precision data of 32 bits into low-bit-width data of 3 bits.
And carrying out overall quantization on the network weight by using the quantization algorithm, finally obtaining a layer of quantized weight, and carrying out reasoning to obtain an accuracy result. In the step, the weights of other layers are locked without any modification, then the basis functions are replaced and quantized for a plurality of times, the weights quantized by different basis functions are obtained, and the corresponding precision results are obtained by reasoning. And finally, searching, namely selecting the basis function scheme with the lowest precision loss, and adaptively searching to obtain the optimal quantization parameter configuration.
And 5, recording the optimal quantization parameter configuration obtained in the step 4.
And 6, checking whether all layers are quantized, if yes, performing step 7, otherwise, returning to step 2 to perform quantization of the next layer.
And 7, after all the layers are quantized, summarizing the quantization parameter configuration recorded in the step 5 and the obtained quantized weight layer.
And 8, outputting the quantization method obtained in the step 7, and the quantization precision obtained by reasoning after the whole network is quantized by the method, and outputting information such as quantization compression ratio and the like according to the requirement.
And 9, ending.
Based on the same inventive concept, the invention also provides a self-adaptive basis function superposition quantization system based on different network types, so as to realize the self-adaptive basis function superposition quantization method based on different network types, wherein the system comprises an acquisition module, a quantization module, a judgment module and an output module, wherein the acquisition module is used for acquiring the weight of the neural network; the quantization module is used for selecting different basis functions to respectively carry out integral quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; the judging module is used for repeating the steps until all weight layers of the neural network are quantized, summarizing the optimal quantization parameter configuration and quantized weight layers; the output module is used for outputting the quantized result of the whole neural network.
The invention has at least the following technical effects:
according to the invention, the search space is expanded on the existing quantization algorithm, and the neural network can reduce network precision loss under the condition of keeping high compression rate by utilizing the mathematical characteristics of different basis functions, so that the neural network can be more flexible and more efficient under quantization; the present invention follows the existing demand for compression quantization of neural networks and gives a possible direction to how future compression quantization is developed.
It should be noted that, for simplicity of description, the foregoing embodiments are all described as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts referred to are not necessarily required for the present application.
In the above embodiments, the basic principle and main features of the present invention and advantages of the present invention are described. It will be appreciated by persons skilled in the art that the present invention is not limited by the foregoing embodiments, but rather is shown and described in what is considered to be illustrative of the principles of the invention, and that modifications and changes can be made by those skilled in the art without departing from the spirit and scope of the invention, and therefore, is within the scope of the appended claims.

Claims (10)

1. The adaptive basis function superposition quantization method based on different network types is characterized by comprising the following steps of:
s1: acquiring a neural network weight;
s2: selecting different basis functions to respectively carry out overall quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration;
s3: repeating the steps until all weight layers of the neural network are quantized, and summarizing the optimal quantization parameter configuration and the quantized weight layers;
s4: and outputting the quantized result of the whole neural network.
2. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein step S1 comprises the sub-steps of:
s11: for any neural network, confirming the network structure of the neural network;
s12: analyzing a given neural network, and distinguishing to obtain each weight layer;
s13: and taking out one layer of network weight as a network layer to be quantified according to the sequence of the network structure.
3. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein the basis function comprises a polynomial basis, namely: 1, k 2; and different k values are designated at the same time, wherein k is a natural number.
4. The adaptive basis function superposition quantization method based on different network types according to claim 3, wherein the quantization formula is: scale (b0+b1+b1+b2+k2), where b0, b1 and b2 have values of 0 or 1, scale is a scaling factor globally given in the present weight layer, and there are 8 different quantization modes according to the difference of b0, b1 and b 2.
5. The adaptive basis function superposition quantization method based on different network types according to claim 4, wherein a quantization parameter having the smallest difference from the original neural network weight is selected, and b0, b1 and b2 are recorded as quantized numbers to quantize high-precision data into low-bit-width data.
6. The adaptive basis function superposition quantization method based on different network types as recited in claim 1, wherein step S2 is specifically: and (3) carrying out integral quantization on the weight of the neural network by using a quantization algorithm, finally obtaining a whole layer of quantized weight and precision result, locking the weight of other weight layers in the process, not modifying, replacing the basis function, and carrying out multiple times of quantization to obtain the quantized weight and precision result of different basis functions.
7. The method for superposition quantization of adaptive basis functions based on different network types according to claim 6, wherein after obtaining the quantized weights and precision results of different basis functions, searching is performed to select the basis function scheme with the lowest precision loss, and the adaptive searching obtains the optimal quantization parameter configuration.
8. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein step S3 comprises the sub-steps of:
s31: checking whether all the weight layers are quantized, if so, entering step S32, otherwise, returning to step S1;
s32: after all the weight layers are quantized, summarizing the quantization parameter configuration recorded in the step S2 and the obtained quantized weight layers.
9. The adaptive basis function superposition quantization method based on different network types according to claim 1, wherein step S4 is specifically: outputting the quantization methods obtained in the steps S1-S3, the quantization precision obtained after the whole neural network is quantized, and outputting the quantization compression ratio.
10. The adaptive basis function superposition quantization system based on different network types is used for realizing the adaptive basis function superposition quantization method based on different network types according to any one of claims 1-9, and is characterized by comprising an acquisition module, a quantization module, a judgment module and an output module, wherein the acquisition module is used for acquiring the weight of the neural network; the quantization module is used for selecting different basis functions to respectively carry out integral quantization on the weights of the neural network to obtain the quantized weights and precision results of the different basis functions, searching, selecting a basis function scheme with the lowest precision loss, and carrying out self-adaptive searching to obtain the optimal quantization parameter configuration; the judging module is used for repeating the steps until all weight layers of the neural network are quantized, summarizing the optimal quantization parameter configuration and quantized weight layers; the output module is used for outputting the quantized result of the whole neural network.
CN202310172732.8A 2023-02-28 2023-02-28 Adaptive basis function superposition quantization method and system based on different network types Pending CN116306837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310172732.8A CN116306837A (en) 2023-02-28 2023-02-28 Adaptive basis function superposition quantization method and system based on different network types

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310172732.8A CN116306837A (en) 2023-02-28 2023-02-28 Adaptive basis function superposition quantization method and system based on different network types

Publications (1)

Publication Number Publication Date
CN116306837A true CN116306837A (en) 2023-06-23

Family

ID=86784494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310172732.8A Pending CN116306837A (en) 2023-02-28 2023-02-28 Adaptive basis function superposition quantization method and system based on different network types

Country Status (1)

Country Link
CN (1) CN116306837A (en)

Similar Documents

Publication Publication Date Title
CN111815035B (en) Fusion form clustering and TCN-attribute short-term load prediction method
JP7006966B2 (en) Coding method based on mixed vector quantization and nearest neighbor search (NNS) method using this
CN112508118A (en) Target object behavior prediction method aiming at data migration and related equipment thereof
CN115293046A (en) Short-term power load prediction method, device, equipment and medium
CN111900716A (en) Random power flow uncertainty quantification method based on sparse chaotic polynomial approximation
CN110263917B (en) Neural network compression method and device
Hirata et al. Reconstructing state spaces from multivariate data using variable delays
Xu et al. Hybrid post-training quantization for super-resolution neural network compression
CN112182021B (en) User data query method, device and system
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN116306837A (en) Adaptive basis function superposition quantization method and system based on different network types
Liu et al. Size of the dictionary in matching pursuit algorithm
CN116303386A (en) Intelligent interpolation method and system for missing data based on relational graph
CN115086673A (en) Image coding and decoding method and device for multi-code-rate and rapid entropy model calculation
CN112200275B (en) Artificial neural network quantification method and device
Polani On the optimization of self-organizing maps by genetic algorithms
CN113177634A (en) Image analysis system, method and equipment based on neural network input and output quantification
CN110909027A (en) Hash retrieval method
CN117010459B (en) Method for automatically generating neural network based on modularization and serialization
CN113268962B (en) Text generation method and device for building industry information service question-answering system
CN115049021B (en) Data processing method and device applied to public cluster management and equipment thereof
CN112015922B (en) Method, device, equipment and storage medium for retrieving multimedia file
CN112817963B (en) Community kernel decomposition method and system on multidimensional network
Aksu et al. Design, performance, and complexity analysis of residual trellis-coded vector quantizers
Wang et al. BS-pFL: Enabling Low-Cost Personalized Federated Learning by Exploring Weight Gradient Sparsity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination