CN117036860B - Model quantization method, electronic device and medium - Google Patents

Model quantization method, electronic device and medium Download PDF

Info

Publication number
CN117036860B
CN117036860B CN202311031534.6A CN202311031534A CN117036860B CN 117036860 B CN117036860 B CN 117036860B CN 202311031534 A CN202311031534 A CN 202311031534A CN 117036860 B CN117036860 B CN 117036860B
Authority
CN
China
Prior art keywords
pictures
quantization
model
full
calibration data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311031534.6A
Other languages
Chinese (zh)
Other versions
CN117036860A (en
Inventor
徐裕民
陶林
林峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yixing Intelligent Technology Guangzhou Co ltd
Original Assignee
Yixing Intelligent Technology Guangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yixing Intelligent Technology Guangzhou Co ltd filed Critical Yixing Intelligent Technology Guangzhou Co ltd
Priority to CN202311031534.6A priority Critical patent/CN117036860B/en
Publication of CN117036860A publication Critical patent/CN117036860A/en
Application granted granted Critical
Publication of CN117036860B publication Critical patent/CN117036860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a model quantization method, which comprises the steps of firstly scoring full-quantity training data pictures, sorting the pictures according to the score from high to low, selecting the pictures with the scores before sorting, removing M pictures from the pictures, taking the rest pictures as calibration data, selecting different quantization strategies, quantizing the model based on the calibration data, finally selecting an optimal quantization strategy, and quantizing the full-quantity data by adopting the optimal quantization strategy to obtain a final quantization model. The method can effectively improve the stability of model precision, shorten quantization time consumption, further improve efficiency and save bandwidth resources.

Description

Model quantization method, electronic device and medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a model quantization method, an electronic device, and a medium.
Background
Model quantization is a model compression technique that converts floating point store or operations into integer store or operations. The precision loss of the quantized model is not serious under the condition of proper bit number, but the quantized model occupies less storage and bandwidth resources, so that the calculation reasoning speed can be faster. The Post-training quantization (PTQ, post-training Quantization) is a network for directly converting a trained single-precision floating point number (FP 32) network into fixed-point calculation, and the quantization process can be completed only by partial calibration data without any training on an original model in the process.
Currently, some unlabeled data is typically randomly selected from the training set as calibration data in the PTQ method. However, random data selection may result in model accuracy instability due to activation profile mismatch. In addition, the full amount of training data may be directly used as the calibration data, but the time of the quantization process required to use the full amount of data as the calibration data may be relatively long.
Disclosure of Invention
In view of some or all of the problems in the prior art, a first aspect of the present invention provides a model quantization method, including:
scoring the full training data pictures, and sorting the full training data pictures according to the score from high to low;
selecting pictures of which the scores are ranked front L, wherein L=M+N, N, M is a natural number, N is the number of the predicted pictures serving as calibration data, and M is the number of redundant pictures;
m pictures are removed from the L pictures, and the rest pictures are used as calibration data;
selecting different quantization strategies, and quantizing the model based on the calibration data; and
and selecting an optimal quantization strategy, and quantizing the full data by adopting the optimal quantization strategy to obtain a quantization model.
Further, scoring the full-scale training data picture includes:
determining a clustering centroid of each layer of activation values in the model based on the full training data picture;
and calculating the shortest distance between the activation value of each picture in the model and the cluster centroid, and taking the shortest distance as the score of the picture.
Further, the number of pictures predicted as calibration data is determined according to an expected quantization period, including:
calculating the time length required by quantizing the model to be quantized by using the full training data picture as calibration data; and
and determining the expected number of pictures serving as calibration data based on the expected quantization duration according to the proportional relation between the duration and the total number of the full training data pictures.
Further, removing M pictures from the L pictures includes:
rejecting pictures with scores higher than a preset value; and
and deleting M-K pictures from the pictures with the lowest scores, wherein K is the number of the pictures with the scores higher than a preset value.
Further, removing M pictures from the L pictures includes:
and removing M pictures from the L pictures by adopting a method for screening outliers.
Further, the quantization strategy includes: tensor (per tensor) quantization, channel-by-channel (per channel) quantization, relative entropy (KL divergence, kullback-Leibler divergence) based quantization, and hybrid quantization.
Based on the model quantization method as described above, a second aspect of the invention provides an electronic device of model quantization, comprising a memory and a processor, wherein the memory is configured to store a computer program which, when run by the processor, performs the model quantization method as described above.
The third aspect of the present invention also provides a computer readable storage medium storing a computer program which, when run on a processor, performs a model quantization method as described above.
According to the model quantization method, the electronic device and the medium, the calibration data is selected from the full training data through the clustering method, so that the activation value distribution of the calibration data is matched with the model, and the stability of the model accuracy is improved. In addition, quantization strategies are selected based on the calibration data, so that time consumption of each quantization can be effectively shortened, efficiency is improved, and bandwidth resources are saved. And the full-quantity data quantization is carried out based on the selected quantization strategy, so that an optimal quantization model can be obtained.
Drawings
To further clarify the above and other advantages and features of embodiments of the present invention, a more particular description of embodiments of the invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, for clarity, the same or corresponding parts will be designated by the same or similar reference numerals.
Fig. 1 shows a flow chart of a model quantization method according to an embodiment of the present invention.
Detailed Description
In the following description, the present invention is described with reference to various embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods or components. In other instances, well-known structures or operations are not shown or described in detail to avoid obscuring aspects of the invention. Similarly, for purposes of explanation, specific numbers and configurations are set forth in order to provide a thorough understanding of embodiments of the present invention. However, the invention is not limited to these specific details.
Reference throughout this specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
It should be noted that the embodiments of the present invention describe the steps of the method in a specific order, however, this is merely for the purpose of illustrating the specific embodiments, and not for limiting the order of the steps. In contrast, in different embodiments of the present invention, the sequence of each step may be adjusted according to the adjustment of the actual requirement.
The method aims to solve the problem that model accuracy is unstable caused by randomly selecting calibration data pictures in the PTQ quantization process and the problem that sampling the whole amount of data as the calibration pictures takes longer time. The invention provides a model quantization method based on optimized calibration data, which adopts a clustering (Cluster) method to select pictures from full training data, further eliminates abnormal data, finally obtains the calibration data, and then selects an optimal quantization strategy based on the calibration data. By the method, stability of model accuracy can be effectively improved, and system efficiency is improved.
The technical scheme of the invention is further described below with reference to the accompanying drawings of the embodiments.
Fig. 1 shows a flow chart of a model quantization method according to an embodiment of the present invention. As shown in fig. 1, a model quantization method includes:
first, at step 101, a picture is scored. And scoring the full training data pictures, and sorting the training data pictures according to the score from high to low. In one embodiment of the invention, the calibration data is selected from the full-scale training data pictures by a clustering method. In order to avoid mismatch of the calibration data activation value distribution, in one embodiment of the invention, a picture is chosen as the calibration data according to its activation value in the model. Specifically, firstly, determining a clustering centroid of each layer of activation values in a model based on full training data pictures, then calculating the shortest distance between the activation value of each picture in the model and the clustering centroid, taking the shortest distance as the score of the picture, and finding that the more distant the picture score is from the clustering centroid, the higher the picture score is;
next, at step 102, the pictures are ordered. Sorting the full training data pictures according to the scores obtained by calculation in the step 101 from high to low, and selecting the pictures with the scores of L before sorting, wherein L is a natural number;
next, at step 103, the anomaly score is culled. In the embodiment of the present invention, too high scores are regarded as abnormal scores, which have an influence on quantization accuracy, so that these pictures need to be removed. Based on this, in one embodiment of the present invention, the value of L is generally greater than the number N of pictures expected as calibration data, for example, when 100 pictures are expected to be needed as calibration data, the value of L may be 110.
In one embodiment of the invention, the pictures are rejected based on the definition of the anomaly score, thereby obtaining a number N of final calibration data. Specifically, the pictures with the scores higher than the preset value are removed, and because the number K of the pictures with the scores higher than the preset value may be smaller than L-N, L-N-K pictures need to be further deleted from the picture with the lowest score, and only N pictures are reserved as calibration data.
In another embodiment of the present invention, a common method of screening outliers may be used to reject L-N pictures from the L pictures to obtain the final N number of calibration data. The method for screening the outliers comprises a method based on correlation of statistics, clustering, classification, information theory, distance, density and the like. The statistical-based method is to preset a model for the scores of the L pictures, judge the abnormal scores according to the fitting degree of the objects in the data set and the preset model, and reject the abnormal scores. The clustering method is to apply a clustering algorithm to perform clustering operation on the scores of the L pictures, identify points which do not belong to any cluster as abnormal scores, and then reject the points. The classification-based method is to apply a classification algorithm to determine the anomaly score. The information theory-based method applies the theory of the information theory to the detection of abnormal scores. The distance-based method refers to that for any score, when the score exceeding a certain part and the distance are both larger than a preset value, the score is regarded as an abnormal score, and then the abnormal score is removed. And according to the density-based method, calculating local outlier factors of each score according to the density condition of the scores of the L pictures, identifying the outlier degree of the local outlier factors, and finally selecting L-N values with the largest outlier degree as abnormal scores for removing.
Further, since the time period required for quantization is proportional to the number of pictures of the calibration data, in one embodiment of the present invention, the number N of pictures expected as the calibration data is determined according to the expected quantization time period. Specifically, a time period T spent for quantizing a model by using a full-scale training data picture as calibration data is firstly deduced according to the full-scale training data picture specified by a user and the model to be quantized 1 Then according to the quantized duration T acceptable to the user 2 To determine the number N of pictures expected as calibration data:
N=N 1 T 2 /T 1
wherein N is 1 The number of training data pictures is the total;
next, in step 104, an optimal quantization strategy is selected. Based on the calibration data obtained in step 103, different quantization strategies are selected to quantize the model, and an optimal quantization strategy is selected. Since the calibration data contains few pictures, each quantization takes relatively little time. In one embodiment of the present invention, the quantization strategies include a per tensor (per tensor) quantization strategy, a per channel (per channel) quantization strategy, a relative entropy (KL divergence) based quantization strategy, a Kullback-Leibler divergence) based quantization strategy, a hybrid quantization strategy, and the like. Wherein the per-tensor quantization strategy refers to a set of quantization parameters (scale, zero-point) for the entire model. Per channel quantization strategy refers to a set of quantization parameters (scale, zero-point) for each channel in the model, compared to Per channel which requires more quantization parameters to be stored, but which has a higher fine granularity.
In one embodiment of the present invention, the evaluation of the quantization strategy includes comparing the vector of the output of the floating point before quantization with the vector of the output of the fixed point after quantization for each layer of the model with cosine similarity, and/or euclidean distance, etc., the closer the cosine similarity is to 1, and/or the closer the euclidean distance is to 0, the better the effect of the corresponding quantization strategy is indicated. It should be appreciated that in other embodiments of the present invention, other evaluation metrics and/or methods may be employed to evaluate the quantization strategy; and
finally, at step 105, the model is quantized. And quantizing the full data by adopting the optimal quantization strategy to obtain a quantization model.
Based on the model quantization method as described above, the invention further provides an electronic device for model quantization, comprising a memory and a processor, wherein the memory is configured to store a computer program which, when run by the processor, performs the model quantization method as described above.
Furthermore, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, performs a model quantization method as described above.
According to the model quantization method, the electronic device and the medium, the calibration data is selected from the full training data through the clustering method, so that the activation value distribution of the calibration data is matched with the model, and the stability of the model accuracy is improved. In addition, quantization strategies are selected based on the calibration data, so that time consumption of each quantization can be effectively shortened, efficiency is improved, and bandwidth resources are saved. And the full-quantity data quantization is carried out based on the selected quantization strategy, so that an optimal quantization model can be obtained.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to those skilled in the relevant art that various combinations, modifications, and variations can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention as disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (7)

1. A method for model quantization, comprising the steps of:
scoring the full training data pictures, and sorting the full training data pictures according to the score from high to low;
selecting pictures with the scores of L before sorting, wherein L=M+N, N, M is a natural number, N is the number of the pictures which are expected to be used as calibration data, M is the number of redundant pictures, and the value of N is determined according to the expected quantization duration:
calculating a time length T required for quantizing a model to be quantized by using a full-scale training data picture as calibration data 1 The method comprises the steps of carrying out a first treatment on the surface of the And
according to the time length T 1 Total number N of training data pictures with total quantity 1 Proportional relation between them based on expected quantization period T 2 Determining the number N of pictures expected as calibration data:
N=N 1 T 2 /T 1
m pictures are removed from the L pictures, and the rest N pictures are used as calibration data;
quantizing a model based on the calibration data using a plurality of quantization strategies; and
and selecting an optimal quantization strategy from the plurality of quantization strategies, and quantizing the full-quantity data by adopting the optimal quantization strategy to obtain a quantization model.
2. The model quantization method of claim 1, wherein scoring the full-scale training data picture comprises the steps of:
determining a clustering centroid of each layer of activation values in the model based on the full training data picture;
and calculating the shortest distance between the activation value of each picture in the model and the cluster centroid, and taking the shortest distance as the score of the picture.
3. The model quantization method according to claim 1, wherein the step of removing M pictures from the L pictures includes the steps of:
rejecting pictures with scores higher than a preset value; and
and deleting M-K pictures from the pictures with the lowest scores, wherein K is the number of the pictures with the scores higher than a preset value.
4. The model quantization method of claim 1, wherein removing M pictures from the L pictures comprises:
and removing M pictures from the L pictures by adopting a method for screening outliers.
5. The model quantization method of claim 1, wherein the quantization strategy comprises: tensor quantization, channel-by-channel quantization, relative entropy-based quantization, and hybrid quantization.
6. An electronic device for model quantization, comprising a memory and a processor, wherein the memory is configured to store a computer program which, when run by the processor, performs the model quantization method according to any of claims 1 to 5.
7. A computer readable storage medium storing a computer program which, when run on a processor, performs the model quantization method according to any one of claims 1 to 5.
CN202311031534.6A 2023-08-15 2023-08-15 Model quantization method, electronic device and medium Active CN117036860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311031534.6A CN117036860B (en) 2023-08-15 2023-08-15 Model quantization method, electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311031534.6A CN117036860B (en) 2023-08-15 2023-08-15 Model quantization method, electronic device and medium

Publications (2)

Publication Number Publication Date
CN117036860A CN117036860A (en) 2023-11-10
CN117036860B true CN117036860B (en) 2024-02-09

Family

ID=88626013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311031534.6A Active CN117036860B (en) 2023-08-15 2023-08-15 Model quantization method, electronic device and medium

Country Status (1)

Country Link
CN (1) CN117036860B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408696A (en) * 2021-05-17 2021-09-17 珠海亿智电子科技有限公司 Fixed point quantization method and device of deep learning model
WO2022078002A1 (en) * 2020-10-16 2022-04-21 浪潮(北京)电子信息产业有限公司 Image processing method and apparatus, device, and readable storage medium
CN115238874A (en) * 2022-09-22 2022-10-25 深圳市友杰智新科技有限公司 Quantization factor searching method and device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022078002A1 (en) * 2020-10-16 2022-04-21 浪潮(北京)电子信息产业有限公司 Image processing method and apparatus, device, and readable storage medium
CN113408696A (en) * 2021-05-17 2021-09-17 珠海亿智电子科技有限公司 Fixed point quantization method and device of deep learning model
CN115238874A (en) * 2022-09-22 2022-10-25 深圳市友杰智新科技有限公司 Quantization factor searching method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117036860A (en) 2023-11-10

Similar Documents

Publication Publication Date Title
US20210256348A1 (en) Automated methods for conversions to a lower precision data format
US8364618B1 (en) Large scale machine learning systems and methods
CN103336795B (en) Video index method based on multiple features
US6058205A (en) System and method for partitioning the feature space of a classifier in a pattern classification system
US7295718B2 (en) Non-linear quantization and similarity matching methods for retrieving image data
CN108881947B (en) Method and device for detecting infringement of live stream
CN104050247A (en) Method for realizing quick retrieval of mass videos
CN106101740B (en) Video content identification method and device
CN100388279C (en) Method and device for measuring visual similarity
CN109871749B (en) Pedestrian re-identification method and device based on deep hash and computer system
CN110363297A (en) Neural metwork training and image processing method, device, equipment and medium
CN115576502B (en) Data storage method and device, electronic equipment and storage medium
CN102663681B (en) Gray scale image segmentation method based on sequencing K-mean algorithm
CN113221983A (en) Training method and device for transfer learning model, and image processing method and device
CN116662832A (en) Training sample selection method based on clustering and active learning
CN117036860B (en) Model quantization method, electronic device and medium
JP2024123172A (en) Partitioning method, encoder, decoder, and computer storage medium
US20210192319A1 (en) Information processing apparatus, method, and medium
CN110704408A (en) Clustering-based time sequence data compression method and system
CN118095371A (en) Large model quantization algorithm based on low-rank dictionary
CN114417095A (en) Data set partitioning method and device
Duvignau et al. Piecewise linear approximation in data streaming: Algorithmic implementations and experimental analysis
US6970268B1 (en) Color image processing method and apparatus thereof
Eliassen et al. Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization
KR102466482B1 (en) System and method for accelerating deep neural network training using adaptive batch selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant