CN114662672A

CN114662672A - Neural network model quantification method, device, equipment and storage medium

Info

Publication number: CN114662672A
Application number: CN202210326787.5A
Authority: CN
Inventors: 罗年; 夏伟腾; 王嫣然; 甘凯今
Original assignee: Jiangsu Zhongtian Anchi Technology Co ltd
Current assignee: Jiangsu Zhongtian Anchi Technology Co ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-24

Abstract

The invention discloses a neural network model quantization method, a device, equipment and a storage medium, and belongs to the technical field of model quantization. The method comprises the following steps: obtaining a plurality of training samples and a neural network model to be quantized; dividing a plurality of training samples into a plurality of sample groups; respectively extracting training samples with preset sample quantity from a plurality of sample groups to obtain a model quantitative input sample set; and quantizing the input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters to quantize the network, and obtaining a quantized neural network model. According to the method, a plurality of sample groups with different characteristics are obtained based on the characteristics of the training samples, each sample group corresponds to a certain characteristic of the original sample, and the training samples are extracted from the sample groups to be used as the input of a model quantization link, so that the model is quantized by using an input sample set with diversity and representativeness, and the accuracy of model quantization and the generalization capability of the quantized model are improved.

Description

Neural network model quantification method, device, equipment and storage medium

Technical Field

The present invention relates to the field of model quantization technologies, and in particular, to a neural network model quantization method, apparatus, device, and storage medium.

Background

In the related art, when a neural network model is quantized, some input samples need to be selected to complete model quantization, and at the present stage, a manual selection mode is often adopted to select samples which may be representative and different types for subsequent quantization processing.

However, the manual selection method has strong subjectivity, so that it is difficult to ensure the representativeness and diversity of the extracted samples, and further, the dynamic ranges of the weights and activation values obtained in the subsequent quantization process are narrow, which causes the problems of poor accuracy and low generalization capability of the quantized model.

Disclosure of Invention

The invention mainly aims to provide a neural network model quantification method, a neural network model quantification device, neural network model quantification equipment and a storage medium, and aims to solve the technical problems that in the prior art, the quantification precision is poor and the generalization capability of a quantified model is low.

According to a first aspect of the present invention, there is provided a neural network model quantification method, the method comprising:

obtaining a plurality of training samples and a neural network model to be quantized;

dividing the training samples into a plurality of sample groups according to the feature types of the training samples;

respectively extracting training samples with preset sample quantity from the plurality of sample groups to obtain a model quantitative input sample set;

and quantizing the input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters to quantize the network, and obtaining a quantized neural network model.

Optionally, before the dividing the plurality of training samples into a plurality of sample groups according to the feature types of the plurality of training samples, the method further includes:

extracting characteristic values of a plurality of training samples;

based on the distribution condition of the feature values of the training samples, clustering the feature values of the training samples by using an unsupervised algorithm, and screening out a central feature value with a preset classification number from the feature values of the training samples;

and determining the feature types of the plurality of training samples according to the feature types corresponding to the central feature values.

Optionally, the dividing the plurality of training samples into a plurality of sample groups according to the feature types of the plurality of training samples includes:

for any non-central characteristic value, based on the distribution condition of the characteristic values of a plurality of training samples, screening out a target central characteristic value closest to the non-central characteristic value from the central characteristic values with preset classification number;

and collecting the training samples corresponding to the non-central characteristic values into a sample group to which the training samples corresponding to the target central characteristic values belong.

Optionally, the extracting training samples of a preset number of samples from the plurality of sample groups respectively to obtain a model quantization input sample set includes:

randomly sampling the sample group aiming at any sample group, and randomly extracting training samples with preset sample quantity from the sample group;

and obtaining the model quantitative input sample set based on training samples extracted from the plurality of feature sets.

determining the characteristic value distribution of the corresponding training sample aiming at any sample group;

extracting training samples with preset sample quantity from the sample group according to rules according to the characteristic value distribution;

Optionally, the quantizing the input sample set according to the model, obtaining a dynamic range of the weight and the activation value, determining a quantization parameter to quantize the network, and obtaining a quantized neural network model, includes:

quantizing a plurality of input samples in the input sample set according to the model to obtain a dynamic range of a weight and an activation value, and quantizing the weight by using a quantization tool to obtain a quantized weight;

and quantizing the activation value according to the quantized weight and the plurality of input samples to obtain a quantized neural network model.

Optionally, after the quantizing the input sample set according to the model, updating the weight parameter, and obtaining a quantized neural network model, the method further includes:

and transplanting the quantified neural network model into an embedded device.

According to a second aspect of the present invention, there is provided a neural network model quantizing device, the device including:

the device comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a plurality of training samples and a neural network model to be quantized;

the sample dividing module is used for dividing the training samples into a plurality of sample groups according to the feature types of the training samples;

the sample sampling module is used for respectively extracting training samples with preset sample quantity from the plurality of sample groups to obtain a model quantitative input sample set;

and the model quantization module is used for quantizing the input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters to quantize the network and obtaining a quantized neural network model.

According to a third aspect of the present invention, there is provided a neural network model quantizing device including: a memory, a processor and a neural network model quantification program stored on the memory and executable on the processor, the neural network model quantification program, when executed by the processor, implementing the steps described in any of the possible implementations of the first aspect.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a neural network model quantization program, which when executed by a processor, implements the various steps described in any one of the possible implementations of the first aspect.

The embodiment of the invention provides a neural network model quantification method, a device, equipment and a storage medium, wherein a plurality of training samples and a neural network model to be quantified are obtained through neural network model quantification equipment; dividing the training samples into a plurality of sample groups according to the feature types of the training samples; respectively extracting training samples with preset sample quantity from the plurality of sample groups to obtain a model quantitative input sample set; and quantizing the input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters to quantize the network, and obtaining a quantized neural network model.

The method classifies training samples based on the characteristics of the training samples to obtain a plurality of sample groups with different characteristics, each sample group can represent a certain characteristic of an original sample, a certain number of training samples representing different characteristics are extracted from different sample groups to be used as model input, so that an input sample set with diversity and representativeness is obtained, finally, the model is quantized by using the input sample set, the dynamic range of weight and activation value can be improved, the quantization precision of the model can be improved, and the generalization capability of the quantized model is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a neural network model quantization device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a neural network model quantization method according to a first embodiment of the present invention;

FIG. 3 is a detailed flowchart of the step S202 in FIG. 2 according to the present invention;

FIG. 4 is a detailed flowchart of the step S203 in FIG. 2 according to the present invention;

FIG. 5 is a detailed flowchart of the step S203 in FIG. 2 according to the present invention;

FIG. 6 is a detailed flowchart of the step S204 in FIG. 2 according to the present invention;

fig. 7 is a functional block diagram of a neural network model quantization apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: obtaining a plurality of training samples and a neural network model to be quantified; dividing the training samples into a plurality of sample groups according to the feature types of the training samples; respectively extracting training samples with preset sample quantity from the plurality of sample groups to obtain a model quantitative input sample set; and quantizing the input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters to quantize the network, and obtaining a quantized neural network model.

In the related art, when a neural network model is quantized, some input samples need to be selected to complete model quantization, and at the present stage, a manual selection mode is often adopted to select samples which may be representative to perform subsequent quantization processing. However, the manual selection method has strong subjectivity, so that it is difficult to ensure the representativeness and diversity of the extracted samples, and further, the dynamic ranges of the weights and activation values obtained in the subsequent quantization process are narrow, which causes the problems of poor accuracy and low generalization capability of the quantized model.

The invention provides a solution, which is used for neural network model quantization equipment, and is characterized in that training samples are classified based on the characteristics of the training samples to obtain a plurality of sample groups with different characteristics, each sample group can represent a certain characteristic of an original sample, and then a certain number of training samples representing different characteristics are extracted from different sample groups respectively to be used as model input, so that an input sample set with diversity and representativeness is obtained, finally, the model is quantized by using the input sample set, the dynamic range of weight and activation value can be improved, the precision of model quantization can be improved, and the generalization capability of the model obtained after quantization is improved.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Where "first" and "second" are used in the description and claims of embodiments of the invention to distinguish between similar elements and not necessarily for describing a particular sequential or chronological order, it is to be understood that such data may be interchanged where appropriate so that embodiments described herein may be implemented in other sequences than those illustrated or described herein.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a neural network model quantization device of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the neural network model quantizing device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.

Those skilled in the art will appreciate that the architecture shown in FIG. 1 does not constitute a limitation of the neural network model quantification apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, the memory 1005, which is a storage medium, may include an operating system, a sample acquiring module, a sample processing module, a model quantizing module, and a neural network model quantizing program, wherein the sample processing module may be further refined into a sample dividing module and a sample sampling module.

In the neural network model quantifying device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the neural network model quantization apparatus of the present invention may be provided in the neural network model quantization apparatus, and the neural network model quantization apparatus calls the neural network model quantization program stored in the memory 1005 through the processor 1001 and executes the neural network model quantization method provided by the embodiment of the present invention.

Based on the above hardware structure but not limited to the above hardware structure, the present invention provides a first embodiment of a neural network model quantization method. Referring to fig. 2, fig. 2 is a flowchart illustrating a neural network model quantization method according to a first embodiment of the present invention.

In this embodiment, the method includes:

step S201, obtaining a plurality of training samples and a neural network model to be quantized;

in this embodiment, the execution subject is a neural network model quantization device, and the device may receive a training sample and a neural network model to be quantized, which are input by a user, in real time, or may retrieve the training sample and the neural network model to be quantized from a background database. After the neural network model to be quantized is obtained, the neural network model quantizing device can read various information such as a model structure, a weight parameter and the like of the model so as to facilitate subsequent processing. There may be many training samples input according to actual requirements, such as images, audio, text, and sequence data, which is not limited in this embodiment.

Step S202, dividing a plurality of training samples into a plurality of sample groups according to the feature types of the plurality of training samples;

for a specific set of training samples, it tends to have many different features, for example, for image samples, the features that can be extracted include: gray level features (mean, variance, energy, gradient, histogram, information entropy, etc.), moment features, convolution features (corresponding convolution features are obtained after multiple convolutions are performed), frequency features (FFT, wavelet, DCT), etc.; for audio samples, features that can be extracted include, but are not limited to: frequency characteristics, energy characteristics, distribution characteristics, etc.; for text samples, the features that can be extracted include, but are not limited to: word frequency, tone, part of speech, word bag, word vector, etc.; for sequence data samples, features that can be extracted include, but are not limited to: the above description is only given by way of example of some common samples, and does not represent that the present embodiment limits the samples to be processed.

Regardless of the sample, the present embodiment can divide the sample based on different feature types of the sample, that is, each divided sample group corresponds to a certain feature of the sample itself, and thus the obtained sample group is representative; meanwhile, a plurality of different sample groups respectively correspond to different sample characteristics, so that the obtained plurality of sample groups have diversity, and the precision of subsequent quantization and the generalization capability of the obtained quantization model are ensured.

Before the division is performed based on the feature types, a plurality of different feature types need to be determined, and the specific method is as follows:

firstly, feature values of each training sample are extracted, the feature values are combined into a feature vector, then the feature vector is input into an unsupervised classification model in the embodiment, and a plurality of feature values contained in the feature vector are classified by using an unsupervised algorithm. Specifically, taking a K-means algorithm (an unsupervised algorithm) as an example, firstly, a user can input a preset classification number according to task complexity and sample approximate characteristics, the more complex the task and the more characteristics the sample has, the larger the corresponding preset classification number is, the central characteristic value of the preset classification number is randomly selected by a K-means algorithm model from all the characteristic values, the preset classification number is assumed to be K, namely, K central characteristic values exist, then the central characteristic values are taken as centroids, the remaining characteristic values are collected to centroids nearest to the centroids to form K clusters, after each collection, the centroids of the K clusters are recalculated, then the recalculated centroids are taken as a standard, the K clusters are regrouped according to the method, the K clusters are obtained by regrouping, then the centroids of the new K clusters are recalculated, and the steps are repeatedly and iteratively executed, and taking the finally obtained characteristic values corresponding to the k centroids as central characteristic values until each cluster and the centroid do not change. The unsupervised classification is an image classification without prior (known) class standard based on the difference of class characteristics of different image ground objects in a characteristic space, and is a method for carrying out clustering statistical analysis on images by a computer on the basis of a cluster as a theory. And establishing a decision rule for classification according to the statistical characteristics of the characteristic parameters of the samples to be classified. Without prior knowledge of the class characteristics. The spatial distribution of the samples is divided or combined into clusters according to the similarity, and the feature class represented by each cluster can be determined only by field investigation or comparison with the feature of a known type. Is one method of pattern recognition. The general algorithm comprises regression analysis, trend analysis, equal mixing distance method, cluster analysis, principal component analysis, pattern recognition and the like.

It can be understood that the feature values corresponding to the same feature have a high similarity, and therefore are closer to each other, and more precisely, are closer to each other, so that during the above iterative operation based on distance, if two feature values correspond to different features, the euclidean distance between the two feature values is larger, and further if the two feature values are grouped together, the centroid is inevitably changed, and therefore, after the iterative operation, the k central feature values, i.e., the preset classification number, obtained finally, i.e., the central feature values correspond to different feature types, so that the feature type of the training sample can be determined finally according to the feature types of the central feature values, or more generally, different central feature values represent different feature types.

After the central characteristic values corresponding to different characteristic types are determined, the training samples can be divided based on the central characteristic values. In a specific embodiment, referring to fig. 3, fig. 3 is a schematic detailed flowchart of the step S202 in fig. 2, where the dividing the plurality of training samples into a plurality of sample groups according to the feature types of the plurality of training samples includes:

step A10, aiming at any non-central characteristic value, based on the distribution of the characteristic values of a plurality of training samples, screening out a target central characteristic value closest to the non-central characteristic value from the central characteristic values with preset classification number;

as described above, different central feature values represent different feature types, and thus in the actual processing, the process of dividing the sample according to the feature type is actually the process of dividing the sample according to the different central feature values. Specifically, since the euclidean distances between feature values corresponding to the same feature type are closer to each other than the euclidean distances between feature values corresponding to different feature types, clustering and partitioning can be performed according to the distribution of the respective feature values, that is, the euclidean distances between respective non-center feature values and respective center feature values, while the euclidean distances between feature values corresponding to the same feature are closer to each other, and thus, for each non-center feature value, it is necessary to determine the center feature value closest to the euclidean distance.

Step A20, the training samples corresponding to the non-central characteristic values are collected into the sample group to which the training samples corresponding to the target central characteristic values belong.

For any non-central characteristic value, after finding out the central characteristic value closest to the non-central characteristic value, it indicates that the non-central characteristic value and the central characteristic value correspond to the same characteristic type, so that the non-central characteristic value and the central characteristic value can be classified into one class, that is, the training samples corresponding to the non-central characteristic value are collected into the sample group to which the training samples corresponding to the target central characteristic value belong, and after finishing the division of the training samples corresponding to all non-central characteristic values, a plurality of sample groups can be obtained, of course, the number of the sample groups should be consistent with the number of the central characteristic values, that is, the preset classification number.

Step S203, respectively extracting training samples with preset sample quantity from the plurality of sample groups to obtain a model quantitative input sample set;

after obtaining the plurality of sample groups, training samples with a preset number of samples can be extracted from each sample group according to actual needs, and the training samples jointly form a model quantization input sample set. In the actual operation, it may not be necessary to extract samples strictly in accordance with the preset number of samples, that is, approximately the same or exactly the same number of samples may be extracted from each sample group.

In a specific embodiment, referring to fig. 4, fig. 4 is a schematic flowchart illustrating a detailed process of the step S203 in fig. 2, where the extracting training samples with a preset number of samples from a plurality of sample groups respectively to obtain a model quantization input sample set includes:

step B10, randomly sampling the sample group aiming at any sample group, and randomly extracting training samples with preset sample quantity from the sample group;

and step B20, obtaining the model quantization input sample set based on training samples extracted from the plurality of feature sets.

When the training samples in the sample group are extracted, one possible extraction manner is random sampling, i.e., a preset number of training samples are arbitrarily extracted from each sample group, that is, the process is randomly selected, and there is no specific rule limitation. Then, after a certain number of training samples are extracted from each sample group, all the samples are integrated together and used as a model quantization input sample set together.

In another embodiment, referring to fig. 5, fig. 5 is a schematic flowchart illustrating a detailed process of the step S203 in fig. 2, where the extracting training samples of a preset number of samples from the plurality of sample groups respectively to obtain a model quantization input sample set further includes:

step C10, determining the characteristic value distribution of the corresponding training sample aiming at any sample group;

step C20, extracting training samples with preset sample quantity from the sample group according to rules according to the characteristic value distribution;

and step C30, obtaining the model quantization input sample set based on training samples extracted from the plurality of feature sets.

When the training samples in the sample group are extracted, in addition to the above random sampling method, regular extraction can be performed according to the feature distribution in each sample group. Specifically, taking the frequency characteristics of the audio samples as an example, the frequencies corresponding to different signal components, i.e., different training samples, in the audio signal may be sorted, for example, from large to small, and then training samples of a preset number of samples are extracted from the audio signal at medium intervals according to the sorting result; or the total frequency range may be divided into a plurality of frequency intervals according to the magnitude of each different component frequency, and then a certain number of training samples are selected from each interval, where of course, the total number selected at last needs to satisfy the preset number of samples or at least approaches the preset number of samples. And finally, integrating all training samples extracted from all sample groups together to be used as a model quantization input sample set. It should be noted that, in the present embodiment, the frequency characteristics of the audio samples are taken as an example, and the frequency characteristics are not limited to be used only for the frequency characteristics of the audio samples, and the above method may be adopted to sample various characteristics of various samples, and the specific implementation manner is substantially consistent with the above.

Step S204, quantizing an input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters to quantize the network, and obtaining a quantized neural network model;

after the model quantization input sample set is obtained, the model quantization input sample set can be used to perform quantization processing on the neural network model to be quantized, specifically, the weights and the activation values in the neural network model are subjected to quantization processing to obtain the quantized neural network model.

In a specific embodiment, referring to fig. 6, fig. 6 is a schematic diagram illustrating a detailed flow of the step S204 in fig. 2, where the quantizing an input sample set according to the model, obtaining dynamic ranges of weights and activation values, determining quantization parameters to quantize a network, and obtaining a quantized neural network model includes:

step D10, quantizing a plurality of input samples in the input sample set according to the model to obtain dynamic ranges of weights and activation values, and quantizing the weights by using a quantization tool to obtain quantized weights;

and after the model quantization input sample set is obtained, performing subsequent quantization processing on the neural network model by using a quantization tool. The quantization tool used in this embodiment may be various open source tools based on development environments such as Python, C + +, and the like, such as tensrflow Lite, TensorRT, PaddleSlim, Pytorch, and the like, or may be an open/closed source tool provided by a deep learning chip manufacturer, which is not limited in this embodiment.

Specifically, after the input model quantizes the input sample set, the quantization tool performs quantization calculation on the weight parameter according to a certain calculation rule to complete quantization of the weight parameter. For example, the maximum and minimum values in the tensor are counted, and then the middle value is used for carrying out symmetrical quantization; for another example, an intermediate value is specified, and asymmetric quantization is performed according to a distance from a weight to the intermediate value in the tensor, of course, a specific quantization calculation rule is determined by a rule defined by a quantization tool used, which is not limited in this embodiment.

And D20, quantizing the activation value according to the quantized weight and the plurality of input samples to obtain a quantized neural network model.

On the basis, in order to improve the generalization capability of the quantized neural network model, in this embodiment, the output value of the activation function in the neural network model is also quantized, after the weights are quantized and updated, the quantized input sample set of the model is subjected to weighted addition based on the updated weights, and then the obtained result is input into the activation function, where the activation function is a nonlinear function, and functions as a mapping function, and the input value can be mapped to an output value in a range, so that the parameter range of the activation value of the neural network model is determined, and the quantized neural network model is obtained.

The method includes the steps that a model quantization input sample set has a large influence on an activation value parameter of a model, for example, if an outlier exists in a sample group, a range of a finally obtained activation value may be obviously expanded, so that quantization accuracy is reduced, and in the embodiment, a representative and diverse model quantization input sample set is obtained in advance, that is, each sample is classified strictly based on features, so that the problem of the outlier does not occur, and thus the quantization accuracy can be improved; on the basis, if an activation function is not used, in this case, the input of each layer of nodes of the neural network model is a linear function of the output of the previous layer, and at this time, the output of each layer is a linear combination of the inputs regardless of how many layers of the model exist, which is similar to the most original perceptron, then the approximation capability, i.e., the generalization capability, of the model is quite limited, so in order to improve the generalization capability of the quantized neural network model, in this embodiment, non-linear functions such as a Sigmoid function, a ReLU function, a Leak-ReLU function, a tanh function, an ELU, and the like are introduced as the activation function, and in order to improve the precision of the quantization processing on the activation function, the above representative and diverse model quantization input sample sets need to be selected.

And finally, transplanting the quantified neural network into the embedded equipment for normal use.

In this embodiment, training samples are classified based on their features to obtain a plurality of sample groups with different features, and each sample group corresponds to a certain feature of an original sample, that is, each sample group is representative, and then a certain number of training samples are extracted from each sample group respectively and collectively used as a model input, that is, the extracted training samples are diverse, so that the model is quantized using the input sample set with diversity and representativeness, thereby improving the precision of the model quantization and the generalization capability of the quantized model.

Based on the same inventive concept, an embodiment of the present invention further provides a neural network model quantization apparatus, which is shown in fig. 7 and includes:

the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a plurality of training samples and a neural network model to be quantized;

Furthermore, in an embodiment, the present application further provides a computer storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the method in the foregoing method embodiments.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories. The computer may be a variety of computing devices including intelligent terminals and servers.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for neural network model quantization, the method comprising:

and quantizing the input sample set according to the model, obtaining the dynamic range of the weight and the activation value, determining quantization parameters, quantizing the network, and obtaining a quantized neural network model.

2. The method of claim 1, wherein before the dividing the plurality of training samples into the plurality of sample groups according to the feature types of the plurality of training samples, the method further comprises:

extracting characteristic values of a plurality of training samples;

3. The method of claim 2, wherein the dividing the plurality of training samples into a plurality of sample groups according to the feature types of the plurality of training samples comprises:

4. The method of claim 1, wherein the extracting a predetermined number of training samples from the plurality of sample groups to obtain a model quantization input sample set comprises:

5. The method of claim 1, wherein the extracting a predetermined number of training samples from the plurality of sample groups to obtain a model quantization input sample set comprises:

6. The method of claim 1, wherein quantizing the input sample set according to the model, obtaining dynamic ranges of weights and activation values, determining quantization parameters to quantize the network, and obtaining a quantized neural network model, comprises:

quantizing a plurality of input samples in the input sample set according to the model to obtain a dynamic range of weights and activation values, and quantizing the weights by using a quantization tool to obtain quantized weights;

7. The method of claim 1, wherein after quantizing the input sample set according to the model, obtaining the dynamic ranges of the weights and the activation values, determining a quantization parameter to quantize the network, and obtaining the quantized neural network model, the method further comprises:

and transplanting the quantized neural network model into an embedded device.

8. An apparatus for neural network model quantization, the apparatus comprising:

9. A neural network model quantification apparatus comprising a memory, a processor and a neural network model quantification program stored on the memory and executable on the processor, the neural network model quantification program when executed by the processor implementing the steps of the neural network model quantification method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a neural network model quantization program, which when executed by a processor, implements the steps of the neural network model quantization method according to any one of claims 1 to 7.