WO2022021834A1 - 神经网络模型确定方法、装置、电子设备、介质及产品 - Google Patents

神经网络模型确定方法、装置、电子设备、介质及产品 Download PDF

Info

Publication number
WO2022021834A1
WO2022021834A1 PCT/CN2021/075472 CN2021075472W WO2022021834A1 WO 2022021834 A1 WO2022021834 A1 WO 2022021834A1 CN 2021075472 W CN2021075472 W CN 2021075472W WO 2022021834 A1 WO2022021834 A1 WO 2022021834A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
initial model
parameter
parameters
model
Prior art date
Application number
PCT/CN2021/075472
Other languages
English (en)
French (fr)
Inventor
李伯勋
张弛
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2022021834A1 publication Critical patent/WO2022021834A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of information processing, and in particular, to a method, apparatus, electronic device, computer-readable storage medium, and computer program product for determining a neural network model.
  • neural network-related technology research With the rapid development of neural network-related technology research, a large number of neural network-related technologies have emerged in related fields, such as convolutional neural networks used in the field of vision and recurrent neural networks used in speech recognition or natural language processing. These neural network technologies have greatly improved the processing accuracy in the corresponding fields.
  • the parameters of neural network models are usually in the order of millions, tens of millions or hundreds of millions, so the requirements for computing and storage devices are high. Especially when the neural network model is deployed to mobile terminal applications, such as access control systems, shopping mall monitoring, subway entrances, mobile phones, etc., it requires too much computing resources and memory of mobile terminals.
  • the network of the terminal and the compression algorithm of the neural network have become a research hotspot.
  • Network compression methods generally include quantization, pruning, and low-rank decomposition.
  • quantization refers to converting floating-point network parameters into integer network parameters, which can reduce the storage space of parameters, while the quantization method in the prior art is mainly based on the range of network parameters.
  • the model parameters obtained by quantification in this way can reduce the storage space of the parameters and improve the calculation speed of the neural network, it also makes the accuracy of the neural network model lower.
  • the purpose of the embodiments of the present application is to provide a method, apparatus, electronic device, computer-readable storage medium and computer program product for determining a neural network model, so as to at least solve the problem of reducing the storage space of neural network model parameters in the prior art.
  • an embodiment of the present application provides a method for determining a neural network model, the method comprising:
  • the initial model parameters in the neural network model are quantized by using the quantization parameters to obtain an updated neural network model.
  • the corresponding quantization parameters are determined according to the mathematical distribution corresponding to the initial model parameters in the neural network model, and then the initial model parameters are quantized by using the quantization parameters to obtain an updated neural network model, as described in this application.
  • the mathematical distribution of the initial model parameters some mathematical rules of the model parameters in each layer of the network can be taken into account, so that the quantized model parameters are within a reasonable range, which can reduce the storage space of the model parameters and improve the updated model parameters. Computational speed and accuracy of neural network models.
  • determining the quantization parameter corresponding to the initial model parameter according to the mathematical distribution including:
  • a quantization parameter of an initial model parameter of the target network layer is determined based on the degree of dispersion.
  • the corresponding quantization parameters are determined by considering the degree of dispersion between the initial model parameters, so that the obtained quantization parameters are more reasonable, and then the initial model parameters can be quantized to a reasonable range.
  • determining the degree of dispersion between initial model parameters in the target network layer of the neural network model according to the mathematical distribution including:
  • the standard deviation between the initial model parameters in the target network layer of the neural network model is calculated and obtained, and the standard deviation is used to characterize the degree of dispersion.
  • determining the quantization parameter corresponding to the initial model parameter of the target network layer based on the degree of dispersion includes:
  • the first value is determined as the quantization parameter corresponding to the initial model parameter of the target network layer.
  • the standard deviation can be adjusted correspondingly through the preset coefficient, so that a more reasonable quantization parameter can be obtained.
  • determining the quantization parameters corresponding to the initial model parameters of the target network layer in the neural network model according to the mathematical distribution including:
  • the quantization parameter corresponding to the initial model parameter of the target network layer is determined based on the mean value.
  • the corresponding quantization parameters are determined by considering the mean value between the initial model parameters, so that the obtained quantization parameters are more reasonable, and the model parameters can be quantized to a reasonable range.
  • the determining the quantization parameter corresponding to the initial model parameter of the target network layer based on the mean value includes:
  • the second value is determined as a quantization parameter corresponding to the initial model parameter of the target network layer.
  • the mean value can be adjusted correspondingly through the preset coefficient, so that a more reasonable quantization parameter can be obtained.
  • the determining the mathematical distribution corresponding to the initial model parameters of the at least one network layer includes: determining the mathematical distribution corresponding to the initial model parameters of each network layer of the neural network model, wherein the neural network model The types of mathematical distributions corresponding to at least two network layers in the network model are different.
  • the initial model parameter is a weight type parameter, an intermediate result type parameter or an output value type parameter, and the mathematical distributions corresponding to different types of initial model parameters are different.
  • the initial model parameter is a floating-point parameter
  • the described initial model parameter in the neural network model is quantized using the quantization parameter to obtain an updated neural network model, including:
  • each initial model parameter in the corresponding network layer is converted into an integer parameter to obtain an updated neural network model, and the integer parameter is the quantized model parameter.
  • the floating-point model parameters are quantized into integer model parameters by using quantization parameters, so that the storage space of the parameters can be reduced and the calculation rate of the neural network model can be improved.
  • the updated neural network model after obtaining the updated neural network model, it also includes:
  • the updated neural network model is trained to obtain a trained neural network model, thereby further improving the training accuracy of the neural network model.
  • the mathematical distribution includes at least one of normal distribution, half-normal distribution, Bernoulli distribution, binomial distribution, multinomial distribution, uniform distribution, exponential distribution, and sampling distribution.
  • an embodiment of the present application provides an apparatus for determining a neural network model, the apparatus comprising:
  • the model parameter obtaining module is used to obtain the initial model parameters in the neural network model
  • a mathematical distribution determination module used for determining the mathematical distribution corresponding to the initial model parameters
  • a quantization parameter determination module configured to determine the quantization parameter corresponding to the initial model parameter according to the mathematical distribution
  • a model determination module configured to perform quantization processing on the initial model parameters in the neural network model by using the quantization parameters to obtain an updated neural network model.
  • the quantization parameter acquisition module includes:
  • a first quantitative parameter calculation module configured to determine the degree of dispersion between initial model parameters in the target network layer of the neural network model according to the mathematical distribution
  • a first quantization parameter determination module configured to determine the quantization parameter corresponding to the initial model parameter of the target network layer based on the degree of dispersion.
  • the first quantitative parameter calculation module is specifically configured to calculate and obtain the standard deviation between initial model parameters in the target network layer of the neural network model, where the standard deviation is used to characterize the discrete degree.
  • the first quantization parameter calculation module includes:
  • a first quantization parameter calculation module configured to calculate the product of the standard deviation between the initial model parameters and the preset coefficient, to obtain the first value of the target network layer
  • the first quantization parameter determination submodule is used for determining the first numerical value of the target network layer as the quantization parameter corresponding to the initial model parameter of the target network layer.
  • the quantization parameter determination module includes:
  • a second quantization parameter calculation module configured to determine the mean value of the initial model parameters in the target network layer of the neural network model according to the mathematical distribution
  • the second quantization parameter determination module is used to determine the quantization parameter corresponding to the initial model parameter of the target network layer based on the mean value.
  • the second quantization parameter determination module includes:
  • the second quantization parameter calculation module is used to calculate the product between the mean value and the preset coefficient to obtain the second value of the corresponding network layer;
  • the second quantization parameter determination submodule is configured to determine the second value as the quantization parameter corresponding to the initial model parameter of the target network layer.
  • the determining the mathematical distribution corresponding to the initial model parameters of the at least one network layer includes:
  • the mathematical distribution corresponding to the initial model parameters of each network layer of the neural network model is determined, wherein the types of mathematical distributions corresponding to at least two network layers in the neural network model are different.
  • the initial model parameters obtained by the model parameter obtaining module are weight type parameters, intermediate result type parameters or output value type parameters, and the mathematical distributions corresponding to different types of initial model parameters are different.
  • the initial model parameters obtained by the model parameter obtaining module are floating-point parameters
  • the model determining module is configured to convert each initial model parameter in the corresponding target network layer into Integer-type parameters, the updated neural network model is obtained, and the integer-type parameters are quantized model parameters.
  • the device further includes:
  • a model training module configured to train the neural network model updated by the model determination module to obtain a trained neural network model.
  • the mathematical distribution includes at least one of normal distribution, half-normal distribution, Bernoulli distribution, binomial distribution, multinomial distribution, uniform distribution, exponential distribution, and sampling distribution.
  • an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the operation is as described above.
  • the steps in the neural network model determination method provided by the first aspect are described above.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, executes the steps in the above-described neural network model determination method.
  • an embodiment of the present application provides a computer program product, including computer program code, when the computer program code is executed on an electronic device, the electronic device executes the above-described neural network model determination method.
  • the initial model parameters in the neural network model are first obtained; the mathematical distribution corresponding to the initial model parameters is determined; the quantization parameters corresponding to the initial model parameters are determined according to the mathematical distribution; The parameters quantify one or more of the initial model parameters in the neural network model to obtain an updated neural network model.
  • the initial model parameters of one or more network layers in the neural network model are quantified, so that the one or more network layers are quantified.
  • the initial model parameters of the road layer are quantified within a reasonable range, which not only reduces the storage space of the model parameters of the neural network model, but also improves the calculation speed and accuracy of the neural network model.
  • FIG. 1 is a schematic structural diagram of an electronic device for performing a method for determining a neural network model provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a method for determining a neural network model according to an embodiment of the present application
  • FIG. 3 is a structural block diagram of an apparatus for determining a neural network model according to an embodiment of the present application.
  • the parameters of the neural network model increase, which makes the storage space of the neural network model larger.
  • the neural network model in order to minimize the storage space occupied by the neural network model, the neural network model can be compressed.
  • the neural network model can be compressed by quantifying the model parameters, so as to realize the space occupied by the neural network model on the terminal.
  • the storage space is as small as possible, so that more storage space can be reserved for the terminal for other processing processes.
  • An embodiment of the present application provides a method for determining a neural network model, by determining corresponding quantization parameters according to the mathematical distribution corresponding to the initial model parameters in the neural network model, and then using the quantization parameters to quantify the initial model parameters to obtain an updated Neural network model.
  • the initial model parameters of the neural network model are quantified, so as to quantify the initial model parameters of the corresponding network layers in the neural network model within a reasonable range, not only The storage space of the model parameters of the neural network model is reduced, and the calculation speed and accuracy of the neural network model are also improved.
  • FIG. 1 is a schematic structural diagram of an electronic device for executing a method for determining a neural network model according to an embodiment of the present application.
  • the electronic device may include: at least one processor 110, such as a CPU, at least one communication Interface 120 , at least one memory 130 and at least one communication bus 140 .
  • the communication bus 140 is used to realize the direct connection and communication of these components.
  • the communication interface 120 of the device in the embodiment of the present application is used to communicate signaling or data with other node devices.
  • the memory 130 may be a high-speed RAM memory, or a non-volatile memory, such as at least one disk memory.
  • the memory 130 may optionally also be at least one storage device located remotely from the aforementioned processor.
  • Computer-readable instructions are stored in the memory 130.
  • the electronic device When the computer-readable instructions are executed by the processor 110, the electronic device performs the following method process shown in FIG. 2.
  • the memory 130 can be used to store the neural network model.
  • the initial model parameters in the neural network model when the processor 110 quantifies the initial model parameters, can obtain the initial model parameters in the neural network model from the memory 130, and the initial model parameters can be the initial model parameters of a target network layer in the neural network model.
  • the model parameters can also be the initial model parameters of multiple target network layers, which are not limited in this embodiment; then determine the mathematical distributions corresponding to these initial model parameters, and then determine the quantification of the corresponding target network layers in the neural network model according to the mathematical distributions parameters, and then use the quantization parameters to perform quantization processing on the initial model parameters to obtain quantized model parameters.
  • the quantized model parameters and the obtained updated neural network model may also be stored in the memory 130 .
  • the electronic device can be a terminal device or a server.
  • the electronic device When the neural network model is deployed on the terminal device, the electronic device is the terminal device.
  • the electronic device When the neural network model is deployed on the server, the electronic device is the server.
  • the electronic device can also be a server.
  • the server can communicate with the terminal device through the network, and the terminal device can send the obtained model parameters to the server.
  • the server quantifies the model parameters and quantifies the model parameters. back to the end device.
  • FIG. 1 is only for illustration, and the electronic device may further include more or less components than those shown in FIG. 1 , or have different configurations than those shown in FIG. 1 .
  • Each component shown in FIG. 1 may be implemented in hardware, software, or a combination thereof.
  • FIG. 2 is a flowchart of a method for determining a neural network model provided by an embodiment of the present application. The method includes the following steps:
  • Step S110 Obtain initial model parameters in the neural network model.
  • the obtained initial model parameters can be the initial model parameters of one network layer in the neural network model, or the initial model parameters of multiple network layers in the neural network model, where one network layer or multiple network layers can be obtained. Both network layers can be referred to as target network layers.
  • the embodiments of the present application take the example of traversing each network layer in the neural network model to obtain the initial model parameters of each network layer.
  • the network layer with a shallow network depth can be traversed sequentially to the network layer with a deeper network depth, that is, the order from front to back is traversed layer by layer;
  • the order from the output to the input of the network model is to traverse the network layer with the deeper network depth to the network layer with the shallower network depth, that is, traverse layer by layer from the back to the front, so that the initial model of each network layer can be obtained by traversing parameter.
  • the parameter types of the initial model parameters may include, but are not limited to: weights (such as parameters included in the convolution layer), intermediate results (such as feature maps, feature vectors, etc.), output values, and the like.
  • Step S120 Determine the mathematical distribution corresponding to the initial model parameters.
  • each initial model parameter of the neural network model is floating-point types by default, they may contain a long number of decimal places, resulting in a large memory space occupied when they are stored. Therefore, in order to reduce the memory space occupied by parameters, improve the neural network model , each initial model parameter can be quantified.
  • the quantization of the initial model parameters refers to converting the floating-point parameters into integer-type parameters within a certain value range.
  • the mathematical distribution corresponding to the initial model parameters can be used to obtain a more reasonable quantization parameter, that is, the neural network can be determined first.
  • the mathematical distribution corresponding to the initial model parameters in the network model For example, the mathematical distribution corresponding to the initial model parameters of the first network layer in the neural network model can be determined as needed.
  • the mathematical distribution corresponding to the initial model parameters of other network layers in the neural network model can also be determined, which is not done in this embodiment. limit.
  • the above-mentioned mathematical distribution may be preset, for example, the mathematical distribution may include at least one of a normal distribution, a half-normal distribution, a Bernoulli distribution, a binomial distribution, a multinomial distribution, a uniform distribution, an exponential distribution, a sampling distribution, and the like.
  • the mathematical distribution conforming to the corresponding initial model parameters may be set as required, wherein the mathematical distribution may also include other mathematical distributions, such as Poisson distribution.
  • the initial model parameters refer to the parameters of one network layer in the neural network model, and can also be expressed as the parameters of each network layer in multiple network layers in the neural network model, and of course can also be all network layers
  • the parameters of each network layer in the network layer, the parameters of each network layer can conform to a type of uniform mathematical distribution, or can conform to different types of data distributions, such as these parameters can conform to normal distribution or half-normal distribution, etc.
  • the user can input the mathematical distribution corresponding to the initial model parameters in the electronic device, such as a normal distribution, so that the electronic device can determine the initial model parameters after obtaining the initial model parameters in the neural network model.
  • the corresponding mathematical distribution is the normal distribution.
  • the electronic device may also pre-store the mathematical distribution corresponding to the initial model parameters.
  • the pre-stored mathematical distribution corresponding to the initial model parameters is a normal distribution, so that the electronic device can search for the neural network model after obtaining the initial model parameters.
  • the mathematical distribution corresponding to the initial model parameters is sufficient, so that the mathematical distribution corresponding to the initial model parameters can be determined.
  • the mathematical distribution corresponding to the initial model parameters of each network layer of the neural network model may also be different or partially the same.
  • the corresponding mathematical distribution is positive
  • the corresponding mathematical distribution is a half-normal distribution
  • the corresponding mathematical distribution is a normal distribution, etc. That is to say, the mathematical distribution corresponding to the initial model parameters of each network layer in the neural network model can be partly the same and partly different, so that different quantization parameters can be obtained for the initial model parameters of different network layers, so that the The initial model parameters are separately quantized to a more reasonable range.
  • the types of mathematical distributions corresponding to at least two network layers in the neural network model are different.
  • the corresponding mathematical distribution can be set in advance for the initial model parameters of one or more network layers in the neural network model, so that after obtaining the initial model parameters of each network layer in the neural network model in turn, the electronic device can Determine the mathematical distribution corresponding to the initial model parameters for each network layer.
  • the initial model parameters of each network layer of the neural network model may include multiple types of model parameters, such as weight type parameters, intermediate result type parameters, and output value type parameters. Therefore, the initial model parameters may be weight type parameters parameter, intermediate result type parameter, or output value type parameter.
  • the mathematical distributions corresponding to different types of initial model parameters can also be different or the same. For example, the mathematical distribution corresponding to the weight type parameters in the initial model parameters is a normal distribution, and the mathematical distribution corresponding to the intermediate result type parameters is Half-normal distribution, the mathematical distribution corresponding to the output value type parameter is binomial distribution, etc.
  • the electronic device can identify the parameter type of the initial model parameters, and then search for the mathematical distribution corresponding to the parameter type, so as to determine the mathematical distribution corresponding to the initial model parameters.
  • the mathematical distribution corresponding to the above-mentioned initial model parameters can be understood as assuming that the initial model parameters conform to a certain mathematical distribution, not actually that the initial model parameters conform to a certain mathematical distribution.
  • the quantization parameters are calculated based on the mathematical distribution that the initial model parameters conform to, which can make the calculated quantization parameters fit the initial model parameters more closely, and the obtained quantization parameters are more reasonable.
  • the values of the initial model parameters do not change too much, so that the quantized model parameters are within a reasonable range, which not only reduces the storage space of the model parameters, but also improves the accuracy of the neural network model.
  • Step S130 Determine quantization parameters corresponding to the initial model parameters according to the mathematical distribution.
  • the quantization parameter can be understood as a quantization range or quantization standard of the initial model parameter, that is, the initial model parameter can be quantized to an appropriate range by using the quantization parameter.
  • the value of the quantization parameter obtained based on the mathematical distribution corresponding to the initial model parameter is related to the value of the initial model parameter, a corresponding quantization parameter is obtained for the initial model parameter in the neural network model.
  • the quantization parameters of the initial model parameters of the corresponding network layers in the neural network model can be determined by the following two ways of determining the quantization parameters:
  • the standard deviation between the initial model parameters which is used to characterize the degree of dispersion.
  • a quantization parameter of the initial model parameter of the corresponding network layer is determined based on the degree of dispersion. Specifically include:
  • the other is: the way to determine the quantization parameters corresponding to the initial model parameters
  • the mean value of the initial model parameters in the corresponding network layer of the neural network model is determined; then, the quantization parameter of the initial model parameter of the corresponding network layer is determined based on the mean value.
  • determining the quantization parameters of the initial model parameters of the corresponding network layer based on the mean value includes: calculating the mean value and the pre-determined value. Set the product between the coefficients to obtain the second value of the corresponding network layer; and determine the second value as the quantization parameter of the initial model parameter of the corresponding network layer.
  • Step S140 Perform quantization processing on the initial model parameters in the neural network model by using the quantization parameters to obtain an updated neural network model.
  • the quantization parameters can be used to quantize the initial model parameters of the target network layer in the neural network model, so that the quantized neural network model can be obtained, that is, the updated neural network model can be obtained.
  • the latter neural network model The model parameters in the updated neural network model are the model parameters after quantizing the initial model parameters.
  • the corresponding quantization parameters are determined according to the mathematical distribution corresponding to the initial model parameters in the neural network model, and then the quantization parameters are used to quantify the initial model parameters to obtain an updated neural network model.
  • the initial model parameters of the neural network model are quantified, so that the model parameters of the quantized neural network model are within a reasonable range, not only reducing the model parameters.
  • the storage space also improves the calculation speed and accuracy of the neural network model.
  • the corresponding quantization parameters can be determined for each network layer in the neural network model, that is, the corresponding quantization parameters can be obtained for each network layer.
  • the quantization parameter corresponding to each network layer may be different.
  • the quantization parameters corresponding to each network layer are used for quantization processing.
  • the quantization parameters obtained after quantization processing are compared with the initial model parameters. storage space will be smaller.
  • the above-mentioned process of obtaining the quantitative parameters according to the mathematical distribution may be: according to the mathematical distribution, determine the degree of dispersion between the initial model parameters in the target network layer of the neural network model, and then based on the dispersion degree. The degree determines the quantization parameters corresponding to the initial model parameters of the target network layer.
  • the target network layer may refer to any network layer in the neural network model, and the method for obtaining the quantization parameters for each network layer may be obtained in the above-mentioned manner. That is, for the convenience of description, in this embodiment, the method of acquiring the quantization parameters of the initial model parameters of the target network layer is taken as an example.
  • the degree of dispersion between the initial model parameters can be characterized by variance or standard deviation.
  • the degree of dispersion is represented by standard deviation
  • the standard deviation between the initial model parameters in the target network layer can be calculated and then determined based on the standard deviation. the corresponding quantization parameters.
  • SD represents the standard deviation
  • N represents the number of initial model parameters in the target network layer
  • xi represents the initial model parameters
  • represents the mean of the initial model parameters
  • the variance between the initial model parameters of the target network layer can be calculated, and then the quantization parameters corresponding to the initial model parameters of the target network layer can be determined based on the variance.
  • S represents the variance
  • N represents the number of initial model parameters in the target network layer
  • xi represents the initial model parameters
  • represents the mean of the initial model parameters
  • the standard deviation or variance can be directly used as a quantization parameter.
  • the quantization parameters can also be processed correspondingly, such as calculating the standard deviation or the standard deviation between the initial model parameters.
  • the product between the variance and the preset coefficient obtains the first value of the target network layer, and the first value determines that the initial model parameter of the target network layer corresponds to the quantization parameter, and the standard deviation can be adjusted correspondingly through the preset coefficient, Thus, a more reasonable quantization parameter can be obtained.
  • the preset coefficient can be flexibly set according to actual needs, such as 0.9 or 0.8, etc. In specific applications, it is not limited to this, and the preset system can be adaptively adjusted according to actual needs.
  • the corresponding quantization parameters are determined by the degree of dispersion between the initial model parameters in the neural network model, so that the obtained quantization parameters are more reasonable, and the initial model parameters can be quantized to a reasonable range.
  • the above-mentioned way of determining the quantization parameters may be for the case where the mathematical distribution corresponding to the initial model parameters of the network layer in the neural network model is a normal distribution.
  • the quantization parameter can also be obtained in the above-mentioned manner (that is, the quantization parameter is obtained based on the standard deviation).
  • the initial model parameters of each network layer can obtain a corresponding quantization parameter, and when performing parameter quantization, the quantization parameters corresponding to each network layer are used to quantize the initial model parameters of the respective network layers. .
  • the method of calculating the quantization parameters can also be as shown in the above embodiment, that is, the standard deviation between all the initial model parameters can be obtained, and the standard deviation The value obtained by multiplying the preset coefficient is used as the quantization parameter.
  • all the initial model parameters correspond to a quantization parameter, that is, the quantization parameter is used for the quantization process for the initial model parameters of each network layer.
  • the method of obtaining the quantization parameters for the initial model parameters of different parameter types can also be as shown in the above embodiment, that is, for the initial model parameters of a certain parameter type, the standard deviation between the initial model parameters of this type can also be obtained, and then the The value obtained by multiplying the standard deviation by the preset coefficient is used as a quantization parameter, and then the quantization parameter can be used for quantization processing for the initial model parameters of this type. In this way, a quantization parameter can be obtained for each type of initial model parameter.
  • the above-mentioned method of determining the corresponding quantization parameter according to the mathematical distribution may also be: according to the mathematical distribution, determining the corresponding network layer of the neural network model (also referred to as the target network layer, that is, the initial model needs to be optimized The mean value of the initial model parameters in the target network layer of the parameters), and the quantization parameters corresponding to the initial model parameters of the target network layer are determined based on the mean value.
  • the mean value can be directly used as the quantization parameter.
  • the mean value may also refer to the mean value of the absolute values of the initial model parameters, and the mean value of the absolute values may also be used as a quantitative parameter.
  • the product between the mean value and the preset coefficient may be calculated to obtain the second value corresponding to the network layer, and the second value may be determined as the quantization parameter corresponding to the initial model parameter of the target network layer.
  • the mean value can be adjusted correspondingly through the preset coefficient, so that a more reasonable quantization parameter can be obtained.
  • the preset coefficient may be the same as or different from the preset coefficient in the foregoing embodiment, and it may also be flexibly set according to actual requirements, such as 0.9 or 0.8.
  • the corresponding quantization parameter is determined by the mean value between the initial model parameters, so that the obtained quantization parameter is more reasonable, and then the model parameter can be quantized to a reasonable range.
  • the above-mentioned way of determining the quantization parameters can be aimed at the situation that the mathematical distribution corresponding to the initial model parameters of all network layers in the neural network model is a half-normal distribution.
  • different mathematical distributions can also be used for different network layers.
  • the quantization parameters of the corresponding network layers can be obtained in the above manner (for example, the quantization parameters are obtained based on the mean value, etc.).
  • the mathematical distribution corresponding to the initial model parameters of other network layers is other distribution, such as the above-mentioned normal distribution, reference may be made to the method of obtaining the quantization parameter based on the standard deviation in the above-mentioned embodiment.
  • the method of calculating the quantized parameters is also shown in the above-mentioned embodiment, that is, the absolute value mean between all initial model parameters can be obtained, and the absolute value mean is multiplied by the preset value.
  • the values obtained by the coefficients are used as quantization parameters, and all the initial model parameters correspond to a quantization parameter at this time, that is, the quantization parameters are used for quantization processing for the initial model parameters of each network layer.
  • the method of obtaining the quantization parameters for the initial model parameters of different parameter types can also be as shown in the above embodiment, that is, for the initial model parameters of a certain parameter type, the absolute value mean between the initial model parameters of this type can also be obtained, Then, the value obtained by multiplying the mean absolute value by the preset coefficient is used as a quantization parameter, and then the quantization parameter can be used for quantization processing for the initial model parameters of this type. In this way, a quantization parameter can be obtained for each type of initial model parameter.
  • the quantization parameters corresponding to the initial model parameters can also be obtained in other ways.
  • the corresponding different mathematical distributions can be used to determine the acquisition method of the corresponding quantization parameters, that is, the calculation methods of the quantization parameters corresponding to each mathematical distribution can be preset for different mathematical distributions, and the corresponding quantization parameters of each mathematical distribution can be set.
  • the calculation methods of the quantization parameters may be different, or the calculation methods of the quantization parameters corresponding to certain mathematical distributions may be the same, which can be flexibly set according to the actual situation.
  • the quantization parameters can be used to quantize the initial model parameters, so that the initial model parameters can be quantized to an appropriate range, so that the quantized neural network model has higher accuracy.
  • the initial model parameter is a floating-point parameter
  • the process of quantizing the initial model parameter by using the quantization parameter may be: converting each initial model parameter in the corresponding network into an integer parameter based on the quantization parameter, and obtaining For the updated neural network model, the integer parameter is the quantized model parameter.
  • the quantization method can be: rounding (initial model parameter/quantization parameter*bit number), for example, if an initial model parameter is 0.95 and quantized to 8 bits, the quantization range is [- 128,127], if the quantization parameter is also 0.95, then the integer parameter corresponding to the quantized model parameter is 1, and it is converted to 8bit, which is 00000001. If the quantization parameter is 0.05, then the quantized model parameter corresponds to the integer. The type parameter is 19, and it is converted to 8bit, which is 00010011.
  • the quantization method may also be an initial model parameter*quantization parameter*bit number, and different quantization methods may be set according to different actual requirements.
  • different quantization methods can also be set for different types of initial model parameters, or different quantization methods can be set for the initial model parameters of each network layer.
  • the above-mentioned number of bits is determined according to the number of bits of the integer data to be quantized, and it can also be set according to the needs of the user.
  • the initial model parameters are floating-point data
  • the quantized model parameters are integer data. Since floating-point data can record data information after the decimal point, the neural network model has higher precision, while integer data Since the data does not record the data information after the decimal point, it can take up less storage space, and when the neural network model uses integer data for calculation, the calculation speed is faster, and in this application, the model parameters are quantified by the above-mentioned parameter quantization method. Quantization to a reasonable range makes the accuracy of the neural network model relatively improved.
  • the above-mentioned initial model parameters may be obtained during the training of the neural network model, or may be obtained after the training of the neural network model is completed. If it is obtained during the training process of the neural network model, after obtaining the quantized model parameters, in order to improve the training accuracy of the neural network model and obtain more accurate training results, the quantized model parameters can also be converted into The corresponding floating-point data is then involved in model training. For example, converting the above-mentioned integer data 1 to floating-point data is 1.000, which can help improve the training accuracy of the neural network model during the training process.
  • the updated neural network model can also be trained to obtain the trained neural network model, that is, the neural network model can be re-trained. Training is performed so that the performance of the trained neural network model can be improved.
  • the obtained trained neural network model can be applied to various application scenarios, such as image recognition, vehicle detection, intelligent monitoring and other scenarios.
  • the obtained trained neural network model can be sent to the terminal device, so that the neural network model that occupies less storage space can be deployed in the terminal device.
  • terminal equipment which can then meet the needs of deploying neural network models on terminal equipment.
  • FIG. 3 is a structural block diagram of an apparatus 200 for determining a neural network model according to an embodiment of the present application.
  • the apparatus 200 may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus 200 corresponds to the above-mentioned method embodiment of FIG. 2 , and can perform various steps involved in the method embodiment of FIG. 2 , and the specific functions of the apparatus 200 may refer to the above description. To avoid repetition, the detailed description is appropriately omitted here. .
  • the apparatus 200 includes:
  • a model parameter obtaining module 210 configured to obtain initial model parameters in the neural network model
  • a mathematical distribution determination module 230 configured to determine the mathematical distribution corresponding to the initial model parameters
  • a quantization parameter determination module 230 configured to determine the quantization parameter corresponding to the initial model parameter according to the mathematical distribution
  • the model determination module 240 is configured to perform quantization processing on the initial model parameters in the neural network model by using the quantization parameters to obtain an updated neural network model.
  • the quantization parameter determination module may include: a first quantization parameter calculation module and a first quantization parameter determination module, wherein,
  • a first quantitative parameter calculation module configured to determine the degree of dispersion between initial model parameters in the target network layer of the neural network model according to the mathematical distribution
  • a first quantization parameter determination module configured to determine the quantization parameter corresponding to the initial model parameter of the target network layer based on the degree of dispersion.
  • the first quantization parameter calculation module includes a standard deviation calculation module, configured to calculate and obtain the neural network according to the mathematical distribution.
  • the standard deviation between the initial model parameters in the target network layer of the network model, the standard deviation being used to characterize the degree of dispersion.
  • the first quantization parameter determination module includes: a first quantization parameter calculation module and a first quantization parameter determination submodule, wherein,
  • a first quantization parameter calculation module configured to calculate the product of the standard deviation between the initial model parameters of the target network layer and the preset coefficient according to the mathematical distribution to obtain the first numerical value
  • the first quantization parameter determination sub-module is configured to determine the first value as the quantization parameter corresponding to the initial model parameter of the target network layer.
  • the quantization parameter determination module includes: a second quantization parameter calculation module and a second quantization parameter determination module, wherein,
  • a second quantization parameter calculation module configured to determine the mean value of the initial model parameters in the target network layer of the neural network model according to the mathematical distribution
  • the second quantization parameter determination module is configured to determine the quantization parameter of the initial model parameter of the target network layer based on the mean value.
  • the The second quantization parameter determination module includes: a second quantization parameter calculation module and a second quantization parameter determination submodule, wherein,
  • a second quantization parameter calculation module configured to calculate the product between the mean value and the preset coefficient to obtain the second value
  • the second quantization parameter determination sub-module is configured to determine the second value as the quantization parameter of the initial model parameter of the target network layer.
  • the quantitative parameter determination module 230 is configured to determine the degree of dispersion between initial model parameters in the target network layer of the neural network model according to the mathematical distribution; determine the target based on the degree of dispersion.
  • the quantization parameter determination module 230 is configured to calculate and obtain a standard deviation between initial model parameters in the target network layer of the neural network model, where the standard deviation is used to characterize the degree of dispersion.
  • the quantization parameter determination module 230 is configured to calculate the product of the standard deviation between the initial model parameters and the preset coefficient to obtain a first value; determine the first value as the target The quantization parameters corresponding to the initial model parameters of the network layer.
  • the quantitative parameter determination module 230 is configured to determine the mean value of the initial model parameters in the target network layer of the neural network model according to the mathematical distribution; determine the initial value of the target network layer based on the mean value.
  • the mean value is the absolute value mean value of the initial model parameter
  • the quantization parameter determination module 230 is configured to calculate the product between the mean value and the preset coefficient to obtain a second value;
  • the binary value is determined as the quantization parameter corresponding to the initial model parameter of the target network layer.
  • the mathematical distributions corresponding to the initial model parameters of each network layer of the neural network model are different.
  • the types of mathematical distributions corresponding to at least two network layers in the neural network model are different.
  • the initial model parameter is a weight type parameter, an intermediate result type parameter or an output value type parameter, and the mathematical distributions corresponding to different types of initial model parameters are different.
  • the initial model parameter is a floating-point parameter
  • the model determination module 240 is configured to convert each initial model parameter in the corresponding network into an integer parameter based on the quantization parameter to obtain an updated neural network.
  • the apparatus 200 further includes:
  • the model training module is used to train the updated neural network model to obtain the trained neural network model.
  • the mathematical distribution includes at least one of normal distribution, half-normal distribution, Bernoulli distribution, binomial distribution, multinomial distribution, uniform distribution, exponential distribution, and sampling distribution.
  • Embodiments of the present application provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the method process performed by the electronic device in the method embodiment shown above is executed.
  • the embodiment of the present application further discloses a computer program product
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium
  • the computer program includes program instructions
  • the computer can execute the methods provided by the above method embodiments, for example, including: obtaining initial model parameters in the neural network model; determining the mathematical distribution corresponding to the initial model parameters; determining the initial model parameters according to the mathematical distribution quantization parameters corresponding to the model parameters; using the quantization parameters to perform quantization processing on the initial model parameters in the neural network model to obtain an updated neural network model.
  • the embodiment of the present application further discloses a computer program product, comprising computer program code, when the computer program code is executed on an electronic device, the electronic device executes the above-mentioned neural network model determination method.
  • the embodiments of the present application provide a method, apparatus, electronic device, and readable storage medium for determining a neural network model.
  • the initial model parameters are quantized by using the quantization parameters to obtain the updated neural network model.
  • some mathematical rules of the model parameters in each layer of the network can be taken into account, so that after the quantization, the mathematical distribution of the initial model parameters is determined.
  • the initial model parameters are within a reasonable range, which can reduce the storage space of the initial model parameters and improve the calculation speed and accuracy of the updated neural network model.
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
  • units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请提供一种神经网络模型确定方法、装置、电子设备及可读存储介质,涉及信息处理技术领域。该方法包括:获取神经网络模型中的初始模型参数;确定所述初始模型参数所对应的数学分布;根据所述数学分布,确定所述初始模型参数对应的量化参数;利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。本申请中通过确定的初始模型参数的数学分布,对神经网络模型中的初始模型参数进行量化,从而将初始模型参数量化在一个合理范围内,不但减少了模型参数的存储空间,还提高了神经网络模型的计算速率和精度。

Description

神经网络模型确定方法、装置、电子设备、介质及产品
本申请要求在2020年7月29日提交中国专利局、申请号为202010748015.1、发明名称为“神经网络模型确定方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,具体而言,涉及一种神经网络模型确定方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
随着神经网络相关技术研究的迅速发展,相关领域内涌现了大批与神经网络相关的技术,如应用于视觉领域的卷积神经网络和应用于语音识别或自然语言处理领域的递归神经网络等,这些神经网络技术均极大地提高了相应领域的处理精度。
神经网络模型的参数通常在百万、千万或上亿数量级,因此对计算和存储设备的要求较高。特别是在神经网络模型被部署到移动终端应用,比如门禁系统、商场监控、地铁口、手机等,需要耗费移动终端太多的计算资源和内存,所以,为了得到更有效率以及能够部署在移动终端的网络,神经网络的压缩算法成了一个研究热点。网络压缩途径一般有量化、剪枝、低秩分解等。
其中,量化是指将浮点型的网络参数转换为整数型的网络参数,以此可减少参数的存储空间,而现有技术中的量化方式主要是基于网络参数的幅度范围来进行量化,这种方式量化获得的模型参数虽然可以减少参数的存储空间,提高神经网络的计算速率,但是同时也使得神经网络模型的精度较低。
因此,相关技术中,在减少神经网络模型参数的存储空间时,如何提高神经网络模型的精度,是目前有待解决的技术问题。
发明内容
本申请实施例的目的在于提供一种神经网络模型确定方法、装置、电子设备、计算机可读存储介质及计算机程序产品,以至少解决由于现有技术中在减少神经网络模型参数的存储空间时,导致神经网络模型的 精度降低的技术问题。
第一方面,本申请实施例提供了一种神经网络模型确定方法,所述方法包括:
获取神经网络模型中的初始模型参数;
确定所述初始模型参数所对应的数学分布;
根据所述数学分布,确定所述初始模型参数对应的量化参数;
利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
在上述实现过程中,通过根据神经网络模型中的初始模型参数所对应的数学分布,确定对应的量化参数,然后利用量化参数对初始模型参数进行量化,得到更新后的神经网络模型,如此本申请中通过确定初始模型参数的数学分布,这样可兼顾各层网络中的模型参数的一些数学规律,使得量化后的模型参数在一个合理范围内,进而可在减少模型参数的存储空间,提高更新后神经网络模型的计算速率和精度。
可选地,所述根据所述数学分布,确定所述初始模型参数对应的量化参数,包括:
根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数之间的离散程度;
基于所述离散程度确定所述目标网络层的初始模型参数的量化参数。
在上述实现过程中,通过考虑初始模型参数之间的离散程度来确定对应的量化参数,从而使得获得的量化参数更合理,进而可将初始模型参数量化到一个合理范围内。
可选地,所述根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数之间的离散程度,包括:
根据所述数学分布,计算获得所述神经网络模型的目标网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
可选地,所述基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数,包括:
根据所述数学分布,计算所述初始模型参数之间的标准差与预设系数之间的乘积,获得第一数值;
将所述第一数值确定为所述目标网络层的初始模型参数对应的量化 参数。
在上述实现过程中,通过预设系数可对标准差进行相应调节,从而可获得一个更为合理的量化参数。
可选地,所述根据所述数学分布,确定所述神经网络模型中的目标网络层的初始模型参数对应的量化参数,包括:
根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数的均值;
基于所述均值确定所述目标网络层的初始模型参数对应的量化参数。
在上述实现过程中,通过考虑初始模型参数之间的均值来确定对应的量化参数,从而使得获得的量化参数更合理,进而可将模型参数量化到一个合理范围内。
可选地,在所述均值为所述初始模型参数的绝对值均值时,所述基于所述均值确定所述目标网络层的初始模型参数对应的量化参数,包括:
计算所述均值与预设系数之间的乘积,获得所述目标网络层的第二数值;
将所述第二数值确定为所述目标网络层的初始模型参数对应的量化参数。
在上述实现过程中,通过预设系数可对均值进行相应调节,从而可获得一个更为合理的量化参数。
可选地,所述确定所述至少一个网络层的初始模型参数所对应的数学分布,包括:确定所述神经网络模型的各个网络层的初始模型参数所对应的数学分布,其中,所述神经网络模型中至少两个网络层对应的数学分布的类型不同。
这样可针对不同网络层的初始模型参数获得不同的量化参数,从而可将不同网络层的初始模型参数分别量化到一个更合理的范围内。
可选地,所述初始模型参数为权重类型参数、中间结果类型参数或输出值类型参数,对于不同类型的初始模型参数所对应的数学分布不同。
这样针对不同类型的初始模型参数对应的量化参数的计算方式可以不同,从而可针对不同类型的初始模型参数获得更理解的量化参数。
可选地,所述初始模型参数为浮点型参数,所述利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后 的神经网络模型,包括:
基于所述量化参数将对应的网络层中每个初始模型参数转换为整数型参数,得到更新后的神经网络模型,所述整数型参数为量化后的模型参数。
在上述实现过程中,利用量化参数将浮点型的模型参数量化为整数型的模型参数,如此可减少参数的存储空间,提高神经网络模型的计算速率。
可选地,在得到更新后的所述神经网络模型之后,还包括:
对更新后的所述神经网络模型进行训练,获得训练后的神经网络模型,从而可进一步提高神经网络模型的训练精度。
可选地,所述数学分布包括正态分布、半正态分布、伯努利分布、二项分布、多项分布、均匀分布、指数分布、抽样分布中的至少一种。
第二方面,本申请实施例提供了一种神经网络模型确定装置,所述装置包括:
模型参数获取模块,用于获取神经网络模型中的初始模型参数;
数学分布确定模块,用于确定所述初始模型参数所对应的数学分布;
量化参数确定模块,用于根据所述数学分布,确定所述初始模型参数对应的量化参数;
模型确定模块,用于利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
可选地,所述量化参数获取模块包括:
第一量化参数计算模块,用于根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数之间的离散程度;
第一量化参数确定模块,用于基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数。
可选地,所述所述第一量化参数计算模块,具体用于计算获得所述神经网络模型的目标网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
可选地,所述第一量化参数计算模块包括:
第一量化参数计算模块,用于计算所述初始模型参数之间的标准差与预设系数之间的乘积,获得所述目标网络层的第一数值;
第一量化参数确定子模块,用于将所述目标网络层的第一数值确定 为所述目标网络层的初始模型参数对应的量化参数。
可选地,所述量化参数确定模块包括:
第二量化参数计算模块,用于根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数的均值;
第二量化参数确定模,用于基于所述均值确定所述目标网络层的初始模型参数对应的量化参数。
可选地,在所述第二量化参数计算模块得到的均值为所述初始模型参数的绝对值均值时,所述第二量化参数确定模块,包括:
第二量化参数计算模块,用于计算所述均值与预设系数之间的乘积,获得对应网络层的第二数值;
第二量化参数确定子模块,用于将所述第二数值确定为所述目标网络层的初始模型参数对应的量化参数。
可选地,所述确定所述至少一个网络层的初始模型参数所对应的数学分布,包括:
确定所述神经网络模型的各个网络层的初始模型参数所对应的数学分布,其中,所述神经网络模型中至少两个网络层对应的数学分布的类型不同。
可选地,所述模型参数获取模块获取的所述初始模型参数为权重类型参数、中间结果类型参数或输出值类型参数,对于不同类型的初始模型参数所对应的数学分布不同。
可选地,所述模型参数获取模块获取的所述初始模型参数为浮点型参数,所述模型确定模块,用于基于所述量化参数将对应的目标网络层中每个初始模型参数转换为整数型参数,得到更新后的神经网络模型,所述整数型参数为量化后的模型参数。
可选地,所述装置还包括:
模型训练模块,用于对所述模型确定模块更新后的所述神经网络模型进行训练,获得训练后的神经网络模型。
可选地,所述数学分布包括正态分布、半正态分布、伯努利分布、二项分布、多项分布、均匀分布、指数分布、抽样分布中的至少一种。
第三方面,本申请实施例提供一种电子设备,包括处理器以及存储器,所述存储器存储有计算机可读取指令,当所述计算机可读取指令由所述处理器执行时,运行如上述第一方面提供的所述神经网络模型确定 方法中的步骤。
第四方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时运行如上述所述神经网络模型确定方法中的步骤。
第五方面,本申请实施例提供一种计算机程序产品,包括计算机程序代码,当所述计算机程序代码在电子设备上运行时,所述电子设备执行如上所述的神经网络模型确定方法。
本申请实施例中,先获取神经网络模型中的初始模型参数;确定所述初始模型参数所对应的数学分布;根据所述数学分布,确定所述初始模型参数对应的量化参数;利用所述量化参数对所述神经网络模型中一个或多个所述初始模型参数进行量化处理,得到更新后的神经网络模型。本申请中通过确定神经网络模型中一个或多个网络层的初始模型参数的数学分布,对该神经网络模型中一个或多个网络层的初始模型参数进行量化,从而将该一个或多个网路层的初始模型参数量化在一个合理范围内,不但减少了神经网络模型的模型参数的存储空间,还提高了神经网络模型的计算速率及精度。
本申请的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请实施例了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请实施例提供的一种用于执行神经网络模型确定方法的电子设备的结构示意图;
图2为本申请实施例提供的一种神经网络模型确定方法的流程图;
图3为本申请实施例提供的一种神经网络模型确定装置的结构框图。
具体实施例
下面将结合本申请实施例中附图,对本申请实施例中的技术方案进 行清楚、完整地描述。
随着神经网络模型的网络层数的增加,神经网络模型的参数也越来越多,这就使得对神经网络模型的存储空间要求较大。在神经网络模型的一些应用场景下,为了尽量减少神经网络模型所占据的存储空间,可以对神经网络模型进行压缩。例如,在神经网络模型部署到一些存储空间相对不是很大的终端设备时,由于神经网络模型的运行就需要占据终端太多的存储空间,所以就使得终端无法为其他处理流程提供足够的存储空间。因此,为了便于神经网络模型部署在终端时其所占据的存储空间能尽量少,则可以采用对模型参数进行量化的方式来对神经网络模型进行压缩,以实现神经网络模型在终端上所占据的存储空间尽可能的少的效果,从而可为终端预留较多的存储空间给其他处理流程。
本申请实施例提供一种神经网络模型确定方法,通过根据神经网络模型中的初始模型参数所对应的数学分布,确定对应的量化参数,然后利用量化参数对初始模型参数进行量化,得到更新后的神经网络模型。本申请中通过确定神经网络模型的初始模型参数的数学分布,对该神经网络模型的初始模型参数进行量化,从而将神经网络模型中对应网路层的初始模型参数量化在一个合理范围内,不但减少了神经网络模型的模型参数的存储空间,还提高了神经网络模型的计算速率和精度。
请参照图1,图1为本申请实施例提供的一种用于执行神经网络模型确定方法的电子设备的结构示意图,所述电子设备可以包括:至少一个处理器110,例如CPU,至少一个通信接口120,至少一个存储器130和至少一个通信总线140。其中,通信总线140用于实现这些组件直接的连接通信。其中,本申请实施例中设备的通信接口120用于与其他节点设备进行信令或数据的通信。存储器130可以是高速RAM存储器,也可以是非易失性的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器130可选的还可以是至少一个位于远离前述处理器的存储装置。存储器130中存储有计算机可读取指令,当所述计算机可读取指令由所述处理器110执行时,电子设备执行下述图2所示方法过程,例如,存储器130可用于存储神经网络模型中的初始模型参数,处理器110在对初始模型参数进行量化时,可从存储器130中获取神经网络模型中的初始模型参数,该初始模型参数可以是神经网络模型中的一个目标网络层的初始模型参数,也可以是多个目标网络层的初始模型参数,本实施例 不做限制;然后确定这些初始模型参数所对应的数学分布,然后根据数学分布确定神经网络模型中对应目标网络层的量化参数,再利用量化参数对初始模型参数进行量化处理,获得量化后的模型参数,量化后的模型参数以及获得的更新后的神经网络模型也可存储于存储器130中。
该电子设备可以是终端设备或者服务器,在神经网络模型部署在终端设备时,电子设备即为终端设备,在神经网络模型部署在服务器时,电子设备即为服务器,当然在神经网络模型部署在终端设备时,电子设备也可为服务器,此时服务器可以与终端设备通过网络进行通信,终端设备可将获取的模型参数发送给服务器,由服务器对模型参数进行量化处理,并将量化后的模型参数返回给终端设备。
可以理解,图1所示的结构仅为示意,所述电子设备还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。图1中所示的各组件可以采用硬件、软件或其组合实现。
请参照图2,图2为本申请实施例提供的一种神经网络模型确定方法的流程图,该方法包括如下步骤:
步骤S110:获取神经网络模型中的初始模型参数。
在具体实现过程中,获取的初始模型参数,可以获取神经网络模型中一个网络层的初始模型参数,也可以获取神经网络模型中多个网络层的初始模型参数,其中,一个网络层或多个网络层均可以称为目标网络层。本申请实施例以对神经网络模型中的各个网络层进行遍历来获取各个网络层的初始模型参数为例。例如,可以按照从神经网络模型的输入到输出的顺序,依次对网络深度较浅的网络层到网络深度较深的网络层进行遍历,即从前往后的顺序逐层遍历;或者也可以从神经网络模型的输出到输入的顺序,依次对网络深度较深的网络层到网络深度较浅的网络层进行遍历,即从后往前的顺序逐层遍历,如此可遍历获得各个网络层的初始模型参数。
其中,初始模型参数的参数类型可以包括但不限于:权重(如卷积层包括的参数)、中间结果(如特征图、特征向量等)、输出值等。
步骤S120:确定所述初始模型参数所对应的数学分布。
由于神经网络模型的初始模型参数默认是浮点型,其可能包含较长的小数位数,导致其存储时占据的内存空间较大,所以,为了减少参数所占据的内存空间,提高神经网络模型的计算速率,则可以对每个初始 模型参数进行量化处理。其中,对初始模型参数进行量化是指将浮点型参数转换为某个取值范围内的整数型参数。虽然转换成整数型参数后其模型参数的存储空间相应减少了,但是神经网络模型的精度也相应降低了。因此,本申请实施例中为了确保在减少参数存储空间的同时也确保神经网络模型的精度较高,可以利用初始模型参数所对应的数学分布来获得一个较为合理的量化参数,即可以先确定神经网络模型中初始模型参数所对应的数学分布。比如,根据需要确定神经网络模型中第一网络层的初始模型参数所对应的数学分布,当然,也可以确定神经网络模型中其他网络层的初始模型参数所对应的数学分布,本实施例不做限制。
上述的数学分布可以是预先设定的,如数学分布可以包括正态分布、半正态分布、伯努利分布、二项分布、多项分布、均匀分布、指数分布、抽样分布等中的至少一种,在经过多次实践后,得到如下结论,对于在基于初始模型参数符合不同的数学分布确定的量化参数,并利用该量化参数对神经网络模型的初始模型参数进行量化后得到的神经网络模型的精度也不同,但是相比于原来的神经网络模型,其量化后的神经网络模型的精度均有所提高。所以,可以根据需求设定相应初始模型参数所符合的数学分布即可,其中,数学分布还可以包括其他数学分布,如泊松分布等。
在一些实施方式中,初始模型参数是指神经网络模型中的一个网路层的参数,也可以表示为神经网络模型中多个网络层中每个网络层的参数,当然也可以是所有网络层中每个网络层的参数,这些各个网络层的参数均可以符合一个类型统一的数学分布,也可以符合不同类型的数据分布,如这些参数均可以符合正态分布或半正态分布等。
在这种实施方式中,用户可在电子设备中输入初始模型参数所对应的数学分布,如正态分布,这样电子设备在获得神经网络模型中的初始模型参数后,即可确定初始模型参数所对应的数学分布为正态分布。或者,电子设备也可以预先存储有初始模型参数所对应的数学分布,如预先存储有初始模型参数所对应的数学分布为正态分布,这样电子设备可在获得初始模型参数后查找该神经网络模型的初始模型参数所对应的数学分布即可,从而即可确定初始模型参数所对应的数学分布。
在一些实施方式中,神经网络模型的各个网络层的初始模型参数所对应的数学分布也可以不同也可以部分相同,如针对第一个网络层的初 始模型参数,其所对应的数学分布为正态分布,针对第二个网络层的初始模型参数,其所对应的数学分布为半正态分布,针对第三个网络层的初始模型参数,其所对应的数学分布为正态分布等。也即神经网络模型中各个网络层的初始模型参数所对应的数学分布可以部分相同,部分不相同,这样即可针对不同网络层的初始模型参数获得不同的量化参数,从而可将不同网络层的初始模型参数分别量化到一个更合理的范围内。在一些实施方式中,所述神经网络模型中至少两个网络层对应的数学分布的类型不同。
在这种实施方式中,可预先针对神经网络模型中一个或多个网络层的初始模型参数设置对应的数学分布,这样电子设备在依次获得神经网络模型中各个网络层的初始模型参数后,可确定每个网络层的初始模型参数所对应的数学分布。
在一些实施方式中,神经网络模型各个网络层的初始模型参数可以包括多种类型的模型参数,如为权重类型参数、中间结果类型参数和输出值类型参数,所以,初始模型参数可以为权重类型参数、中间结果类型参数或输出值类型参数。对于不同类型的初始模型参数所对应的数学分布也可以不同,也可以相同,如初始模型参数中的权重类型参数其所对应的数学分布为正态分布,中间结果类型参数所对应的数学分布为半正态分布,输出值类型参数所对应的数学分布为二项分布等。
在这种实施方式中,电子设备在获得初始模型参数后,可对初始模型参数的参数类型进行识别,然后查找该参数类型对应的数学分布,从而可确定初始模型参数所对应的数学分布。
可以理解地,上述初始模型参数所对应的数学分布可以理解为是假设初始模型参数符合某种数学分布,并不是实际上初始模型参数符合某种数学分布。这样在基于初始模型参数符合的数学分布来计算量化参数,可以使得计算出的量化参数与初始模型参数更贴合,且获得的量化参数更为合理,从而在对初始模型参数进行量化后,可以使得初始模型参数的数值不会改变太多,使得量化后的模型参数在一个合理范围内,不但减小了模型参数的存储空间,而且还提高了神经网络模型的精度。
步骤S130:根据所述数学分布,确定所述初始模型参数对应的量化参数。
其中,量化参数可以理解为是初始模型参数的一个量化范围或者量 化标准等,即利用量化参数可以将初始模型参数量化到一个合适的范围内。
由于基于初始模型参数所对应的数学分布获得的量化参数的数值与初始模型参数的数值相关,所以对于神经网络模型中的初始模型参数会获得一个对应的量化参数。
该实施例中,根据所述数学分布,确定所述神经网络模型中对应网络层的初始模型参数的量化参数,可以通过下述两种确定量化参数的方式:
一种确定初始模型参数对应的量化参数的方式为:
先根据所述数学分布,确定所述神经网络模型的对应网络层中的初始模型参数之间的离散程度;具体的,根据所述数学分布,计算获得所述神经网络模型的对应网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
然后,基于所述离散程度确定所述对应网络层的初始模型参数的量化参数。具体包括:
先根据所述数学分布,计算所述对应网络层的初始模型参数之间的标准差与预设系数之间的乘积,获得所述网络层的第一数值;将所述网络层的第一数值确定为所述网络层的初始模型参数对应的量化参数。
另一种为:确定初始模型参数对应的量化参数的方式
先根据所述数学分布,确定所述神经网络模型的对应网络层中的初始模型参数的均值;然后,基于所述均值确定所述对应网络层的初始模型参数的量化参数。
其中,当所述均值为所述目标网络层的初始模型参数的绝对值均值时,所述基于所述均值确定所述对应网络层的初始模型参数的量化参数,包括:计算所述均值与预设系数之间的乘积,获得对应网络层的第二数值;并将所述第二数值确定为所述对应网络层的初始模型参数的量化参数。
需要说明的是,这两种确定量化参数的具体过程将在下述对应的实施例进行说明,在此不再赘述。
步骤S140:利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
在上述获得神经网络模型中目标网络层的量化参数后,可利用量化 参数对该神经网络模型中目标网路层的初始模型参数进行量化处理,从而可获得量化后的神经网络模型,即获得更新后的神经网络模型。更新后的神经网络模型中的模型参数为对初始模型参数进行量化后的模型参数。
在上述实现过程中,通过根据神经网络模型中的初始模型参数所对应的数学分布,确定对应的量化参数,然后利用量化参数对初始模型参数进行量化,得到更新后的神经网络模型。如此本申请中,通过确定神经网络模型的初始模型参数的数学分布,对神经网络模型初始模型参数进行量化,使得量化后的神经网络模型的模型参数在一个合理范围内,不但减少了模型参数的存储空间,还提高了神经网络模型的计算速率及精度。
上述在确定神经网络模型的初始模型参数对应的量化参数时,为了获得更为合理的量化参数,可以针对神经网络模型中每个网络层来确定对应的量化参数,即针对每个网络层可获得一个量化参数,每个网络层对应的量化参数可能不同,针对目标网络层的初始模型参数,均利用各个网络层对应的量化参数进行量化处理,量化处理后获得的量化参数相比于初始模型参数的存储空间会更小。
可选的,在一些实施方式中,在上述根据数学分布获得量化参数的过程可以为:根据数学分布,确定神经网络模型的目标网络层中的初始模型参数之间的离散程度,然后基于该离散程度确定目标网络层的初始模型参数对应的量化参数。
其中,目标网络层可以是指神经网络模型中的任意一个网络层,针对每个网络层的量化参数的获取方式均可以以上述方式获得。也即,为了描述的方便,该实施方式中以对目标网络层的初始模型参数的量化参数的获取方式为例。
初始模型参数之间的离散程度可以用方差或标准差来表征,在离散程度用标准差来表征时,可以计算获得目标网络层中的初始模型参数之间的标准差,然后基于标准差来确定对应的量化参数。
其标准差的计算公式如下所示
Figure PCTCN2021075472-appb-000001
其中,SD表示标准差,N表示目标网络层中的初始模型参数的数量, x i表示初始模型参数,μ表示初始模型参数的均值。
若离散程度以方差来表示,则可计算目标网络层的初始模型参数之间的方差,然后基于方差确定目标网络层的初始模型参数对应的量化参数。
其中,方差的计算公式如下所示:
Figure PCTCN2021075472-appb-000002
其中,S表示方差,N表示目标网络层中的初始模型参数的数量,x i表示初始模型参数,μ表示初始模型参数的均值。
在一些实施方式中,在获得神经网络模型中的初始模型参数之间的标准差或方差后,可以直接将标准差或方差作为量化参数。
或者在一些情况中,若直接将标准差或方差作为量化参数对初始模型参数进行量化可能不合理,所以,还可以先对量化参数进行相应的处理,如计算初始模型参数之间的标准差或方差与预设系数之间的乘积,获得目标网络层的第一数值,将该第一数值确定目标网络层的初始模型参数对应的为量化参数,通过预设系数可对标准差进行相应调节,从而可获得一个更为合理的量化参数。
其中,预设系数可以根据实际需求灵活设置,如0.9或0.8等,在具体应用时,并不限于此,可以根据实际需要对预设系统进行适应性的调整。
在上述实现过程中,通过神经网络模型中的初始模型参数之间的离散程度来确定对应的量化参数,从而使得获得的量化参数更合理,进而可将初始模型参数量化到一个合理范围内。
可以理解地,上述确定量化参数的方式可以是针对神经网络模型中网络层的初始模型参数所对应的数学分布为正态分布的情况,当然,对于其他数学分布,如半正态分布、多项分布等,也可以采用上述方式获得量化参数(即基于标准差获得量化参数)。这种方式中,每个网络层的初始模型参数均可获得一个对应的量化参数,而在进行参数量化时,是利用每个网络层对应的量化参数对各自网络层的初始模型参数进行量化处理。
而针对神经网络模型中的所有初始模型参数确定对应的量化参数时,其计算的量化参数的方式也可以如上述实施例所示,即可以获取所 有初始模型参数之间的标准差,将标准差乘以预设系数获得的数值作为量化参数,此时所有的初始模型参数均对应一个量化参数,即对于每个网络层的初始模型参数均利用该量化参数进行量化处理。
对于不同参数类型的初始模型参数的量化参数获取方式也可如上述实施例所示,即对于某种参数类型的初始模型参数,也可获取该类型的初始模型参数之间的标准差,然后将标准差乘以预设系数获得的数值作为量化参数,然后对于该类型的初始模型参数均可利用该量化参数进行量化处理。这种方式下,对于每种类型的初始模型参数均可各自获得一个量化参数。
作为另外一种实施方式,上述根据数学分布,确定对应的量化参数的方式还可以为:根据数学分布,确定神经网络模型的对应网络层(也可以称为目标网路层,即需要优化初始模型参数的目标网络层)中的初始模型参数的均值,基于该均值确定目标网络层的初始模型参数对应的量化参数。
可选的,在一些实施方式中,可以直接将均值作为量化参数。该均值也可以是指初始模型参数的绝对值的均值,该绝对值的均值也可以作为量化参数。
具体的,可以计算均值与预设系数之间的乘积,获得对应网络层的第二数值,将第二数值确定为目标网络层的初始模型参数对应的量化参数。通过预设系数可对均值进行相应调节,从而可获得一个更为合理的量化参数。
其中,该预设系数与上述实施例中的预设系数可以相同,也可以不同,其也可以根据实际需求灵活设定,如0.9或0.8等。
在上述实现过程中,通过初始模型参数之间的均值来确定对应的量化参数,从而使得获得的量化参数更合理,进而可将模型参数量化到一个合理范围内。
可以理解地,上述确定量化参数的方式可以是针对神经网络模型中所有网络层的初始模型参数所对应的数学分布为半正态分布的情况,当然,也可以不同网络层采用不同的数学分布,对于各个网络层采用不同的数学分布,如正态分布、多项分布等,均可以采用上述方式获得对应网络层的量化参数(比如基于均值获得量化参数等)。对于其他网络层的初始模型参数所对应的数学分布为其他分布,如上述的正态分布时, 可参照上述实施例中基于标准差来获得量化参数的方式。
而针对神经网络模型中的所有初始模型参数时,其计算的量化参数的方式也如上述实施例所示,即可以获取所有初始模型参数之间的绝对值均值,将绝对值均值乘以预设系数获得的数值作为量化参数,此时所有的初始模型参数均对应一个量化参数,即对于每个网络层的初始模型参数均利用该量化参数进行量化处理。
对于不同参数类型的初始模型参数的量化参数获取方式也可如上述实施例所示,即对于某种参数类型的初始模型参数时,也可获取该类型的初始模型参数之间的绝对值均值,然后将绝对值均值乘以预设系数获得的数值作为量化参数,然后对于该类型的初始模型参数均可利用该量化参数进行量化处理。这种方式下,对于每种类型的初始模型参数均可各自获得一个量化参数。
可以理解地,对于初始模型参数所对应的数学分布为其他分布时,如二项分布等,其获取初始模型参数对应的量化参数也可以为其他方式,在实际应用中,可以根据初始模型参数所对应的不同的数学分布,来确定对应的量化参数的获取方式即可,也即可以预先设置针对不同的数学分布,设置每个数学分布对应的量化参数的计算方式,其每个数学分布对应的量化参数的计算方式可以不同,也可以某几个数学分布对应的量化参数的计算方式相同,其可以根据实际情况灵活设置。
如此,可以按照上述方式获得的量化参数后,可利用量化参数对初始模型参数进行量化处理,则可将初始模型参数量化到一个合适范围内,使得量化后的神经网络模型拥有更高的精度。
在一些实施方式中,初始模型参数为浮点型参数,利用量化参数对初始模型参数进行量化处理的过程可以为:基于量化参数将对应的网络中每个初始模型参数转换为整数型参数,得到更新后的神经网络模型,该整数型参数即为量化后的模型参数。
在具体实现过程中,其量化的方式可以为:取整(初始模型参数/量化参数*bit位数),例如,若某个初始模型参数为0.95,量化到8bit,则取量化范围为[-128,127],若量化参数也为0.95,则获得量化后的模型参数对应的整数型参数为1,将其转换为8bit,即为00000001,若量化参数为0.05,则量化后的模型参数对应的整数型参数为19,将其转换为8bit,即为00010011。
需要说明的是,在其他实施例中,其量化的方式,也可以是初始模型参数*量化参数*bit位数,可以根据不同实际需求,设定不同的量化方式。当然,也可以针对不同类型的初始模型参数设定不同的量化方式,或者针对各个网络层的初始模型参数设定不同的量化方式。
另外需要说明的是,上述的比特位数根据需量化的整数型数据的比特位数确定的,其也可以根据用户的需求设定。
可以理解地,初始模型参数为浮点型数据,量化后的模型参数为整数型数据,由于浮点型数据可以记录小数点之后的数据信息,因而使得神经网络模型具有更高的精度,而整数型数据由于不记录小数点之后的数据信息,因此可以占用更少的存储空间,且神经网络模型利用整数型数据进行计算时,计算速度更快,并且本申请中通过上述的参数量化方式,将模型参数量化到合理的范围内,使得神经网络模型的精度也相对有所提高。
另外,上述的初始模型参数可以是在神经网络模型训练过程中获得的,也可以是在神经网络模型训练完成之后获得的。若是在神经网络模型训练过程中获得的,在获得量化后的模型参数后,为了提高神经网络模型的训练精度,获得更为准确的训练结果,则还可以再将量化后的模型参数再转换为对应的浮点型数据然后参与模型训练,如将上述的整数型数据1转换为浮点型数据为1.000,这样可有助于提高神经网络模型在训练过程中的训练精度。
若初始模型参数是在神经网络模型训练之后获得的,为了进一步提高神经网络模型的训练精度,还可以对更新后的神经网络模型进行训练,获得训练好的神经网络模型,即对神经网络模型重新进行训练,从而可提高训练得到的神经网络模型的性能。获得的训练好的神经网络模型可应用于各种应用场景下,如图像识别、车辆检测、智能监控等场景。
若是上述方法由服务器执行,则在对参数量化后的神经网络模型进行训练后,可以将获得的训练好的神经网络模型发送至终端设备,从而可以将占据存储空间较少的神经网络模型部署在终端设备,进而可满足在终端设备部署神经网络模型的需求。
请参照图3,图3为本申请实施例提供的一种神经网络模型确定装置200的结构框图,该装置200可以是电子设备上的模块、程序段或代码。应理解,该装置200与上述图2方法实施例对应,能够执行图2方法实 施例涉及的各个步骤,该装置200具体的功能可以参见上文中的描述,为避免重复,此处适当省略详细描述。
可选地,所述装置200包括:
模型参数获取模块210,用于获取神经网络模型中的初始模型参数;
数学分布确定模块230,用于确定所述初始模型参数所对应的数学分布;
量化参数确定模块230,用于根据所述数学分布,确定所述初始模型参数对应的量化参数;
模型确定模块240,用于利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
可选地,在另一实施例中,该实施例在上述实施例的基础上,所述量化参数确定模块可以包括:第一量化参数计算模块和第一量化参数确定模块,其中,
第一量化参数计算模块,用于根据所述数学分布,确定所述神经网络模的目标网络层中的初始模型参数之间的离散程度;
第一量化参数确定模块,用于基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数。
可选地,在另一实施例中,该实施例在上述实施例的基础上,所述第一量化参数计算模块,包括标准差计算模块,用于根据所述数学分布,计算获得所述神经网络模型的目标网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
可选地,在另一实施例中,该实施例在上述实施例的基础上,所述第一量化参数确定模块包括:第一量化参数计算模块和第一量化参数确定子模块,其中,
第一量化参数计算模块,用于根据所述数学分布,计算所述目标网络层的初始模型参数之间的标准差与预设系数之间的乘积,获得所述第一数值;
第一量化参数确定子模块,用于将所述第一数值确定为所述目标网络层的初始模型参数对应的量化参数。
可选地,在另一实施例中,该实施例在上述实施例的基础上,所述量化参数确定模块包括:第二量化参数计算模块和第二量化参数确定模块,其中,
第二量化参数计算模块,用于根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数的均值;
第二量化参数确定模块,用于基于所述均值确定所述目标网络层的初始模型参数的量化参数。
可选地,在另一实施例中,该实施例在上述实施例的基础上,在所述第二量化参数计算模块得到的所述均值为所述初始模型参数的绝对值均值时,所述第二量化参数确定模块,包括:第二量化参数计算模块和第二量化参数确定子模块,其中,
第二量化参数计算模块,用于计算所述均值与预设系数之间的乘积,获得所述第二数值;
第二量化参数确定子模块,用于将所述第二数值确定为所述目标网络层的初始模型参数的量化参数。
可选地,所述量化参数确定模块230,用于根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数之间的离散程度;基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数。
可选地,所述量化参数确定模块230,用于计算获得所述神经网络模型的目标网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
可选地,所述量化参数确定模块230,用于计算所述初始模型参数之间的标准差与预设系数之间的乘积,获得第一数值;将所述第一数值确定为所述目标网络层的初始模型参数对应的量化参数。
可选地,所述量化参数确定模块230,用于根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数的均值;基于所述均值确定所述目标网络层的初始模型参数对应的量化参数。
可选地,所述均值为所述初始模型参数的绝对值均值,所述量化参数确定模块230,用于计算所述均值与预设系数之间的乘积,获得第二数值;将所述第二数值确定为所述目标网络层的初始模型参数对应的量化参数。
可选地,所述神经网络模型的各个网络层的初始模型参数所对应的数学分布不同。在一些实施例中,所述神经网络模型中至少两个网络层对应的数学分布的类型不同。
可选地,所述初始模型参数为权重类型参数、中间结果类型参数或 输出值类型参数,对于不同类型的初始模型参数所对应的数学分布不同。
可选地,所述初始模型参数为浮点型参数,所述模型确定模块240,用于基于所述量化参数将对应的网络中每个初始模型参数转换为整数型参数,得到更新后的神经网络模型,所述整数型参数为量化后的模型参数。
可选地,所述装置200还包括:
模型训练模块,用于对更新后的神经网络模型进行训练,获得训练好的神经网络模型。
可选地,所述数学分布包括正态分布、半正态分布、伯努利分布、二项分布、多项分布、均匀分布、指数分布、抽样分布中的至少一种。
本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,执行如上所示方法实施例中电子设备所执行的方法过程。
本申请实施例还公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的方法,例如,包括:获取神经网络模型中的初始模型参数;确定所述初始模型参数所对应的数学分布;根据所述数学分布,确定所述初始模型参数对应的量化参数;利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
本申请实施例还公开一种计算机程序产品,包括计算机程序代码,当所述计算机程序代码在电子设备上运行时,所述电子设备执行如上所述的神经网络模型确定方法。
综上所述,本申请实施例提供一种神经网络模型确定方法、装置、电子设备及可读存储介质,通过根据神经网络模型中的初始模型参数所对应的数学分布,确定对应的量化参数,然后利用量化参数对初始模型参数进行量化,得到更新后的神经网络模型,如此本申请中通过确定初始模型参数的数学分布,这样可兼顾各层网络中的模型参数的一些数学规律,使得量化后的初始模型参数在一个合理范围内,进而可在减少初始模型参数的存储空间,提高更新后神经网络模型的计算速率和精度。
在本申请所提供的实施例中,应该理解到,所揭露装置和方法,可 以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
再者,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
以上所述仅为本申请的实施例而已,并不用于限制本申请的保护范围,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种神经网络模型确定方法,其特征在于,所述方法包括:
    获取神经网络模型中的初始模型参数;
    确定所述初始模型参数所对应的数学分布;
    根据所述数学分布,确定所述初始模型参数对应的量化参数;
    利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述数学分布,确定所述初始模型参数对应的量化参数,包括:
    根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数之间的离散程度;
    基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数之间的离散程度,包括:
    根据所述数学分布,计算获得所述神经网络模型的目标网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
  4. 根据权利要求2或3所述的方法,其特征在于,所述基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数,包括:
    根据所述数学分布,计算所述目标网络层的初始模型参数之间的标准差与预设系数之间的乘积,获得第一数值;
    将所述第一数值确定为所述目标网络层的初始模型参数对应的量化参数。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述数学分布,确定所述初始模型参数对应的量化参数,包括:
    根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数的均值;
    基于所述均值确定所述目标网络层的初始模型参数对应的量化参数。
  6. 根据权利要求5所述的方法,其特征在于,在所述均值为所述初始模型参数的绝对值均值时,所述基于所述均值确定所述目标网络层的初始模型参数对应的量化参数,包括:
    计算所述均值与预设系数之间的乘积,获得第二数值;
    将所述第二数值确定为所述目标网络层的初始模型参数对应的量化 参数。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述确定所述初始模型参数所对应的数学分布,包括:
    确定所述神经网络模型的各个网络层的初始模型参数所对应的数学分布,其中,所述神经网络模型中的至少两个网络层对应的数学分布的类型不同。
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述初始模型参数为权重类型参数、中间结果类型参数或输出值类型参数,不同类型的初始模型参数所对应的数学分布不同。
  9. 根据权利要求1-6任一项所述的方法,其特征在于,所述初始模型参数为浮点型参数,所述利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型,包括:
    基于所述量化参数将对应的目标网络层中每个初始模型参数转换为整数型参数,得到更新后的神经网络模型,所述整数型参数为量化后的模型参数。
  10. 根据权利要求1-6任一项所述的方法,其特征在于,在得到更新后的所述神经网络模型之后,还包括:
    对更新后的所述神经网络模型进行训练,获得训练后的神经网络模型。
  11. 根据权利要求1-8任一项所述的方法,其特征在于,所述数学分布包括正态分布、半正态分布、伯努利分布、二项分布、多项分布、均匀分布、指数分布、抽样分布中的至少一种。
  12. 一种神经网络模型确定装置,其特征在于,所述装置包括:
    模型参数获取模块,用于获取神经网络模型中的初始模型参数;
    数学分布确定模块,用于确定所述初始模型参数所对应的数学分布;
    量化参数确定模块,用于根据所述数学分布,确定所述初始模型参数对应的量化参数;
    模型确定模块,用于利用所述量化参数对所述神经网络模型中的所述初始模型参数进行量化处理,得到更新后的神经网络模型。
  13. 根据权利要求12所述的装置,其特征在于,所述量化参数确定模块包括:
    第一量化参数计算模块,用于根据所述数学分布,确定所述神经网络模的目标网络层中的初始模型参数之间的离散程度;
    第一量化参数确定模块,用于基于所述离散程度确定所述目标网络层的初始模型参数对应的量化参数。
  14. 根据权利要求13所述的装置,其特征在于,所述第一量化参数计算模块,包括标准差计算模块,用于根据所述数学分布,计算获得所述神经网络模型的目标网络层中的初始模型参数之间的标准差,所述标准差用于表征所述离散程度。
  15. 根据权利要求13或14所述的装置,其特征在于,所述第一量化参数确定模块包括:
    第一量化参数计算模块,用于根据所述数学分布,计算所述目标网络层的初始模型参数之间的标准差与预设系数之间的乘积,获得第一数值;
    第一量化参数确定子模块,用于将所述第一数值确定为所述目标网络层的初始模型参数对应的量化参数。
  16. 根据权利要求12所述的装置,其特征在于,所述量化参数确定模块包括:
    第二量化参数计算模块,用于根据所述数学分布,确定所述神经网络模型的目标网络层中的初始模型参数的均值;
    第二量化参数确定模块,用于基于所述均值确定所述目标网络层的初始模型参数的量化参数。
  17. 根据权利要求16所述的装置,其特征在于,在所述第二量化参数计算模块得到的所述均值为所述初始模型参数的绝对值均值时,
    所述第二量化参数确定模块,包括:
    第二量化参数计算模块,用于计算所述均值与预设系数之间的乘积,获得所述目第二数值;
    第二量化参数确定子模块,用于将所述第二数值确定为所述目标网络层的初始模型参数的量化参数。
  18. 一种电子设备,其特征在于,包括处理器以及存储器,所述存储器存储有计算机可读取指令,当所述计算机可读取指令由所述处理器执行时,运行如权利要求1-11任一所述的神经网络模型确定方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时运行如权利要求1-11任一所述的神经网络模型确定方法。
  20. 一种计算机程序产品,其特征在于,包括计算机程序代码,当所述计算机程序代码在电子设备上运行时,所述电子设备执行根据权利要求1-11中任一项所述的神经网络模型确定方法。
PCT/CN2021/075472 2020-07-29 2021-02-05 神经网络模型确定方法、装置、电子设备、介质及产品 WO2022021834A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010748015.1 2020-07-29
CN202010748015.1A CN112101543A (zh) 2020-07-29 2020-07-29 神经网络模型确定方法、装置、电子设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2022021834A1 true WO2022021834A1 (zh) 2022-02-03

Family

ID=73749884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075472 WO2022021834A1 (zh) 2020-07-29 2021-02-05 神经网络模型确定方法、装置、电子设备、介质及产品

Country Status (2)

Country Link
CN (1) CN112101543A (zh)
WO (1) WO2022021834A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101543A (zh) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 神经网络模型确定方法、装置、电子设备及可读存储介质
CN113762503A (zh) * 2021-05-27 2021-12-07 腾讯云计算(北京)有限责任公司 数据处理方法、装置、设备及计算机可读存储介质
CN113850374A (zh) * 2021-10-14 2021-12-28 安谋科技(中国)有限公司 神经网络模型的量化方法、电子设备及介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858614A (zh) * 2019-01-31 2019-06-07 北京市商汤科技开发有限公司 神经网络训练方法及装置、电子设备和存储介质
CN110188880A (zh) * 2019-06-03 2019-08-30 四川长虹电器股份有限公司 一种深度神经网络的量化方法及装置
CN110363281A (zh) * 2019-06-06 2019-10-22 上海交通大学 一种卷积神经网络量化方法、装置、计算机和存储介质
CN110363279A (zh) * 2018-03-26 2019-10-22 华为技术有限公司 基于卷积神经网络模型的图像处理方法和装置
CN110414630A (zh) * 2019-08-12 2019-11-05 上海商汤临港智能科技有限公司 神经网络的训练方法、卷积计算的加速方法、装置及设备
US20200134804A1 (en) * 2018-10-26 2020-04-30 Nec Laboratories America, Inc. Fully convolutional transformer based generative adversarial networks
CN111091184A (zh) * 2019-12-19 2020-05-01 浪潮(北京)电子信息产业有限公司 一种深度神经网络的量化方法、装置、电子设备及介质
CN112101543A (zh) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 神经网络模型确定方法、装置、电子设备及可读存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363279A (zh) * 2018-03-26 2019-10-22 华为技术有限公司 基于卷积神经网络模型的图像处理方法和装置
US20200134804A1 (en) * 2018-10-26 2020-04-30 Nec Laboratories America, Inc. Fully convolutional transformer based generative adversarial networks
CN109858614A (zh) * 2019-01-31 2019-06-07 北京市商汤科技开发有限公司 神经网络训练方法及装置、电子设备和存储介质
CN110188880A (zh) * 2019-06-03 2019-08-30 四川长虹电器股份有限公司 一种深度神经网络的量化方法及装置
CN110363281A (zh) * 2019-06-06 2019-10-22 上海交通大学 一种卷积神经网络量化方法、装置、计算机和存储介质
CN110414630A (zh) * 2019-08-12 2019-11-05 上海商汤临港智能科技有限公司 神经网络的训练方法、卷积计算的加速方法、装置及设备
CN111091184A (zh) * 2019-12-19 2020-05-01 浪潮(北京)电子信息产业有限公司 一种深度神经网络的量化方法、装置、电子设备及介质
CN112101543A (zh) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 神经网络模型确定方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN112101543A (zh) 2020-12-18

Similar Documents

Publication Publication Date Title
WO2022021834A1 (zh) 神经网络模型确定方法、装置、电子设备、介质及产品
WO2023040510A1 (zh) 图像异常检测模型训练方法、图像异常检测方法和装置
CN111626408B (zh) 哈希编码方法、装置、设备及可读存储介质
CN110413812B (zh) 神经网络模型的训练方法、装置、电子设备及存储介质
CN111277511B (zh) 传输速率控制方法、装置、计算机系统及可读存储介质
WO2023050707A1 (zh) 网络模型量化方法、装置、计算机设备以及存储介质
CN112884146B (zh) 一种训练基于数据量化与硬件加速的模型的方法及系统
WO2023020456A1 (zh) 网络模型的量化方法、装置、设备和存储介质
CN113902010A (zh) 分类模型的训练方法和图像分类方法、装置、设备和介质
US20240135698A1 (en) Image classification method, model training method, device, storage medium, and computer program
KR20220116395A (ko) 사전 훈련 모델의 결정 방법, 장치, 전자 기기 및 저장 매체
KR20210105315A (ko) 데이터 주석 방법, 장치, 기기, 저장매체 및 컴퓨터 프로그램
CN115456167A (zh) 轻量级模型训练方法、图像处理方法、装置及电子设备
WO2022246986A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
CN114730367A (zh) 模型训练方法、装置、存储介质和程序产品
CN112399177B (zh) 一种视频编码方法、装置、计算机设备及存储介质
WO2021057926A1 (zh) 一种神经网络模型训练方法及装置
WO2021073638A1 (zh) 运行神经网络模型的方法、装置和计算机设备
CN117746125A (zh) 图像处理模型的训练方法、装置及电子设备
CN110046670B (zh) 特征向量降维方法和装置
CN114065913A (zh) 模型量化方法、装置及终端设备
CN113591709B (zh) 动作识别方法、装置、设备、介质和产品
WO2021244203A1 (zh) 参数优化的方法、电子设备和存储介质
CN114707638A (zh) 模型训练、对象识别方法及装置、设备、介质和产品
US11861452B1 (en) Quantized softmax layer for neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21850125

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 120523)

122 Ep: pct application non-entry in european phase

Ref document number: 21850125

Country of ref document: EP

Kind code of ref document: A1