CN114580610A

CN114580610A - Neural network model quantification method and device

Info

Publication number: CN114580610A
Application number: CN202210114146.3A
Authority: CN
Inventors: 王楠; 周鹏
Original assignee: Alibaba Shenzhen Technology Co ltd
Current assignee: Alibaba Shenzhen Technology Co ltd
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-06-03

Abstract

The embodiment of the specification provides a neural network model quantization method and a device, wherein the neural network model quantization method comprises the following steps: determining target parameters of a multilayer network of the target model according to the parameter attributes of the initial parameters of the target model, determining the storage pressure of the multilayer network of the target model according to the target parameters, and adjusting the target parameters of a target layer network of the target model according to the storage pressure of the multilayer network. The storage pressure of the multilayer network of the target model is determined through the target parameters, the target layer network and the target parameters of the accuracy to be adjusted are determined according to the storage pressure of the multilayer network, different accuracy adjustments are further performed on different target parameters, and the automation degree and the efficiency are improved.

Description

Neural network model quantification method and device

Technical Field

The embodiment of the specification relates to the technical field of neural networks, in particular to a neural network model quantification method.

Background

The model quantization is an important model acceleration means, the size of the model can be effectively reduced, the reasoning speed is improved, and meanwhile, the power consumption is reduced. Especially for low power consumption IoT devices, model quantification has become an essential key step in neural network engineering deployment. Common quantization is 8bits/16bits Int quantization, with ultra-low precision 2bits and 4bits quantization on IoT devices. Lower precision quantization can further improve inference performance and compression model memory overhead, but low precision also brings the reduction of model result accuracy and precision.

The mixed precision quantization is a quantization method which allows different layers in a model, even each different operator, to use different quantization precision, and is flexible and gives consideration to both performance and precision. But how to find a better mixed-precision solution is a challenging task. Taking two types of mixed quantification of int8 and int16 as an example, the total is 2 to the nth power of different combination modes, and n is the number of layers. Taking resnet50 as an example, n is 50. If the quantization is operator-granular, the combination is more. Under the scene of limited storage, the use of the memory is a priority, and under the condition that the use of the memory is not over the limit, how to automatically improve the precision becomes a problem to be solved urgently.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a neural network model quantization method. One or more embodiments of the present disclosure also relate to a neural network model quantization apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve technical deficiencies of the prior art.

According to a first aspect of embodiments herein, there is provided a neural network model quantization method, including:

determining target parameters of a multilayer network of a target model based on a neural network according to parameter attributes of initial parameters of the target model;

determining the storage pressure of the multilayer network of the target model according to the target parameters;

and adjusting the target parameters of the target layer network of the target model according to the storage pressure of the multilayer network.

According to a second aspect of embodiments herein, there is provided a neural network model quantizing device, including:

a parameter determination module configured to determine target parameters of a multi-layer network of a neural network-based target model according to parameter attributes of initial parameters of the target model;

a pressure determination module configured to determine a storage pressure of a multi-layer network of the target model according to the target parameter;

an adjustment module configured to adjust a target parameter of a target layer network of the target model according to a storage pressure of the multilayer network.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the steps of the neural network model quantization method described above.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the neural network model quantization method described above.

According to a fifth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the neural network model quantification method described above.

The embodiment of the specification provides a neural network model quantification method, which comprises the following steps: determining target parameters of a multilayer network of the target model according to the parameter attributes of the initial parameters of the target model, determining the storage pressure of the multilayer network of the target model according to the target parameters, and adjusting the target parameters of a target layer network of the target model according to the storage pressure of the multilayer network. The storage pressure of the multilayer network of the target model is determined through the target parameters, the target layer network and the target parameters of the accuracy to be adjusted are determined according to the storage pressure of the multilayer network, different accuracy adjustments are further performed on different target parameters, and the automation degree and the efficiency are improved.

Drawings

FIG. 1 is a flow chart of a neural network model quantification method provided by an embodiment of the present description;

FIG. 2 is a flowchart illustrating a process of a neural network model quantization method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a neural network model quantization apparatus provided in an embodiment of the present specification;

fig. 4 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be implemented in many ways other than those specifically set forth herein, and those skilled in the art will appreciate that the present description is susceptible to similar generalizations without departing from the scope of the description, and thus is not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Model quantification: the method is a technology for converting floating point calculation into low-ratio specific point calculation, and can effectively reduce the calculation intensity of a model, the size of parameters and the memory consumption.

Active variable analysis (Live variable analysis): a typical dataflow analysis in a compiler calculates on which path exit each variable is active or not.

In the present specification, a neural network model quantization method is provided, and the present specification relates to a neural network model quantization apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Fig. 1 shows a flowchart of a neural network model quantization method provided in an embodiment of the present disclosure, which specifically includes the following steps.

Step 102: determining target parameters of a multi-layer network of a target model based on a neural network according to parameter attributes of initial parameters of the target model.

Wherein, the target model can be a Neural Network model, including but not limited to a fully connected Neural Network (FCN), a Convolutional Neural Network (CNN), a residual error Network (ResNet), and a Feedback Neural Network (Feedback Network); the initial parameters may be parameters of the input object model, such as: inputting a variable A, a variable B and a variable C of a target model; the parameter attribute may be an attribute of an active period of a variable, such as: the active interval of the variable A is in a first-layer network and a second-layer network in the target model, that is, the variable A is used in both the first-layer network and the second-layer network in the target model; the target parameters may be some of the initial parameters, such as: and (5) variable A.

It should be noted that the multi-layer network may be understood as a whole layer network in the target model, or may be a partial layer network in the target model, where the same parameters are included between the layers of the partial layer network.

In practical application, the target model is a convolutional neural network, the initial precision of all parameters in the target model is 8bits, and the model quantization based on 16bits is performed on the target model to improve the calculation precision of the target model, but in one possibility, the storage space of the memory of the hardware cannot support the precision change of all parameters in the target model to 16bits, that is, under the condition that the precision of all parameters in the target model is changed to 16bits, the memory space may be insufficient, under such a condition, part of parameters in the target model need to be selected, the precision of part of parameters needs to be changed to 16bits, and meanwhile, the size of the storage space of the memory cannot be exceeded. Thus, the target parameters for each layer in the target model may be determined first.

For example: the initial parameters of the target model comprise a variable A, a variable B and a variable C, the target model comprises an L1 layer network, an L2 layer network and an L3 layer network, and the target parameters of the L1 layer network, the L2 layer network and the L3 layer network are determined according to the parameter attributes of the variable A, the variable B and the variable C. The specific embodiment is as follows.

The determining the target parameters of the multilayer network of the target model according to the parameter attributes of the initial parameters of the target model based on the neural network comprises the following steps:

determining initial parameters of a target model and parameter attributes of the initial parameters;

and determining target parameters of the multilayer network of the target model according to the parameter attributes of the initial parameters.

Along with the above example, the initial parameters of the target model include variable a, variable B and variable C, the target model includes L1 layer network, L2 layer network and L3 layer network, wherein the active interval of variable a is L1 layer network and L2 layer network, the active interval of variable B is L2 layer network and L3 layer network, the active interval of variable C is L1 layer network, L2 layer network and L3 layer network, then, the target parameters of L1 layer network include variable a and variable C, the target parameters of L2 layer network include variable a, variable B and variable C, and the interval of L3 layer network is variable B and variable C.

The embodiment of the specification determines the target parameter according to the parameter attribute of the initial parameter, so that the system overhead is reduced, and the efficiency is improved.

The determining of the initial parameters of the target model and the parameter attributes of the initial parameters includes:

determining initial parameters of a target model;

performing active variable analysis on the initial parameters to obtain the life cycle of the initial parameters;

correspondingly, the determining the target parameters of the multi-layer network of the target model according to the parameter attributes of the initial parameters includes:

and determining the target parameters of the multilayer network of the target model according to the life cycle of the initial parameters.

The life cycle may be understood as a cycle in which variables in the model are used, and may also be referred to as an active interval in the above embodiment.

Following the above example, the initial parameters of the target model include variable a, variable B, and variable C, and the target model includes L1 tier network, L2 tier network, and L3 tier network. The variable a, the variable B, and the variable C may obtain an active interval (live range) through active variable analysis, specifically, an identity of the variable may be obtained, and whether the variable exists is queried through a multi-layer network in the target model based on the identity of the variable. And inquiring whether the variable A exists in an L1 layer network, an L2 layer network and an L3 layer network by using the identity of the variable A to obtain whether the variable A exists in the L1 layer network and the L2 layer network, wherein the active interval of the variable A is the L1 layer network and the L2 layer network. Similarly, the active intervals of the variable B obtained in the same query manner are an L2-layer network and an L3-layer network, and the active intervals of the variable C are an L1-layer network, an L2-layer network and an L3-layer network, so the target parameters of the L1-layer network include a variable a and a variable C, the target parameters of the L2-layer network include a variable a, a variable B and a variable C, and the target parameters of the L3-layer network are a variable B and a variable C.

The embodiment of the specification determines the target parameters according to the life cycle of the initial parameters, can accurately determine the target parameters of the multilayer network, and improves the accuracy.

Step 104: and determining the storage pressure of the multilayer network of the target model according to the target parameters.

The storage pressure may be a pressure of the multi-layer network in the target model to the memory, for example, if the memory occupied by the first-layer network of the target model is 50 bytes, the storage pressure of the first-layer network of the target model is 50 bytes.

In practical application, after the target parameters of the multilayer network of the target model are determined, the storage pressure of the multilayer network can be calculated through the storage pressure of the target parameters of the multilayer network.

Specifically, the determining the storage pressure of the multilayer network of the target model according to the target parameter includes:

acquiring the quantity and the precision of target parameters of a multilayer network of the target model;

and obtaining the storage pressure of the multilayer network according to the quantity and the precision of the target parameters of the multilayer network.

In practical application, the storage pressure of the multi-layer network is determined, that is, how much memory space the current layer needs to occupy during operation is determined, and when how much memory space the current layer network needs to occupy during operation is determined, the number of parameters needs to be determined first according to how many parameters exist in the layer network and the precision of the parameters, and then the number of parameters needs to be multiplied by the corresponding precision to obtain the storage pressure of the layer network.

For example, the target model includes an L1 layer network and an L2 layer network, where the L1 layer network includes a variable a and a variable B, the target parameters of the L2 layer network include a variable B and a variable C, and the storage pressure of the L1 layer network is calculated for 8bits with initial accuracy, where the variable a occupies the storage pressure of 8bits, that is, 1-byte storage pressure, the variable B occupies the storage pressure of 8bits, that is, 1-byte storage pressure, and then the storage pressure of the L1 layer network is 2 bytes; if the dimension of the L2 network is 2 dimensions, the storage pressure of the variable B and the variable C is multiplied by 2 to obtain the storage pressure of the L2 network, the variable B occupies the storage pressure of 8bits and is 1 byte storage pressure, the variable C occupies the storage pressure of 8bits and is 1 byte storage pressure, and the storage pressure of the L2 network is 4 bytes because the dimension of the L2 network is 2 dimensions.

According to the embodiment of the specification, the storage pressure of the multilayer network can be rapidly determined according to the precision and the quantity of the target parameters of the multilayer network, the calculation mode is simple, and the calculation efficiency is improved.

Step 106: and adjusting the target parameters of the target layer network of the target model according to the storage pressure of the multilayer network.

The target layer network may be any layer network in the target model, for example, the target layer network is a L1 layer network and a L2 layer network; the target parameter may be any parameter in the target layer network of the target model, such as: the target parameters are variable a and variable B.

In one case, the adjusting target parameters of a target layer network of the target model according to the storage pressure of the multilayer network includes:

reducing the precision of the target parameters of the multilayer network to a preset precision under the condition that the storage pressure of the multilayer network is greater than the storage pressure of the initial storage space of the target object; wherein the target model runs on the target object.

The target object may be a Memory (Memory), and accordingly, the storage space may be a storage space of the Memory; the preset precision may be any precision smaller than the precision of the current target parameter, for example, if the precision of the current target parameter is 8bits, the preset precision may be any one of 2bits and 4 bits.

In practical applications, when the parameters in the target model are operated with the initially set accuracy, the storage pressure of the memory may be exceeded, that is, the memory is insufficient, and at this time, the target model cannot be operated, and the accuracy of the parameters in the target model needs to be set to a lower accuracy so that the target model can be operated.

For example, the precision of the parameters in the target model is 8bits, the target model comprises an L1 layer network, wherein the L1 layer network comprises a variable a, a variable B, a variable C and a variable D, the storage pressure of the multilayer network of the target model is calculated, the dimension of the L1 layer network is 2 dimensions, the storage pressure of the L1 layer network is 8 bytes, and when the storage space of the memory is 6 bytes, the storage pressure of the memory is not enough to support the target model to operate, the precision of the L1 layer network is adjusted to 4bits, so that the storage pressure of the L1 layer network becomes 4 bytes.

In the embodiment of the specification, under the condition that the storage pressure of the storage space is less than the storage pressure required by the minimum precision, the precision is uniformly reduced, so that the target model operates, the expenditure of the process on the system performance is reduced, and the efficiency is improved.

In another case, the adjusting the target parameters of the target layer network of the target model according to the storage pressure of the multilayer network includes:

and determining the precision proportion of the multilayer network according to the target precision of the target parameters of the multilayer network of the target model, and adjusting the target parameters of the target layer network of the target model according to the precision proportion of the multilayer network and the storage space of the target object.

Determining a target accuracy of a target parameter of a multi-layer network of the target model;

determining an accuracy ratio according to the target accuracy and the accuracy of the target parameters of the multilayer network;

determining the storage pressure of a proportional storage space of the target object according to the precision proportion;

and adjusting target parameters of a target layer network of the target model according to the storage pressure of the multilayer network and the storage pressure of the proportional storage space.

The target precision can be understood as the precision to be achieved, for example, the current precision is 8bits, and the target precision can be 16 bits; the precision ratio may be a ratio of the current precision to the target precision, for example: the precision ratio is 8bits to 16bits which is 0.5; the proportional memory space may be a space calculated according to the precision ratio, for example, the memory is 100 bytes, the precision ratio is 0.5, and the proportional memory space is 50 bytes.

In practical application, the proportional storage space may be determined according to the target accuracy, and how to adjust the accuracy of the parameters in the target model is determined according to the proportional storage space, that is, it is determined whether the memory space is sufficient under the condition that the accuracy of all the parameters of the target model is set to the target accuracy. When the memory space is enough, the precision of all the parameters is set as the target precision, and when the memory space is not enough, the precision of part of the parameters is set as the target precision.

Specifically, the adjusting the target parameters of the target layer network of the target model according to the storage pressure of the multilayer network and the storage pressure of the proportional storage space includes:

and under the condition that the storage pressure of the multilayer network is smaller than the storage pressure of the proportional storage space, improving the precision of the target parameters of the multilayer network to the target precision.

For example, the initial precision of the parameters of the target model is 8bits, the target precision of the target parameters of the multilayer network of the target model is set to 16bits, the precision ratio can be determined to be 0.5, under the condition that the memory is 100 bytes, the proportional storage space can be determined to be 50 bytes, the storage pressure of the proportional storage space is 50 bytes, and when the storage pressures of the multilayer network of the target model calculated with the initial precision of 8bits are all smaller than 50 bytes, the target precision of the target parameters of the multilayer network of the target model is set to 16 bits.

In the embodiment of the specification, under the condition that the storage pressure of the storage space can support the storage pressure with twice precision requirement, the precision is uniformly improved, so that the target model operates, the expenditure of the process on the system performance is reduced, and the efficiency is improved.

In another practical manner, the adjusting target parameters of a target layer network of the target model according to the storage pressure of the multilayer network and the storage pressure of the proportional storage space includes:

determining a network with the storage pressure less than or equal to the storage pressure of the initial storage space of the target object and greater than the storage pressure of the proportional storage space in the target model as a target layer network;

and adjusting the precision of the target parameters of the target layer network according to the storage pressure of the target layer network.

For example, the initial accuracy of the parameters of the target model is 8bits, the target accuracy of the target parameters of the multilayer network of the target model is set to 16bits, the accuracy ratio can be determined to be 0.5, in the case of 100 bytes of memory, the proportional memory space can be determined to be 50 bytes, the memory pressure of the proportional memory space is 50 bytes, and if the target model includes an L1-layer network and an L2-layer network, the memory pressure of the multilayer network of the target model is calculated with the initial accuracy of 8bits, the memory pressure of the L1-layer network is 40 bytes, and the memory pressure of the L2-layer network is 60 bytes, the L2-layer network is adjusted.

The embodiment of the specification performs different precision adjustments on different target parameters, and under the condition of ensuring the precision, the limitation of the memory is not exceeded, so that the use efficiency of the memory resource is improved.

The adjusting the precision of the target parameter of the target layer network according to the storage pressure of the target layer network comprises:

sorting the target layer network in a descending order according to the storage pressure of the target layer network to obtain a first sorting table;

and adjusting the precision of the target parameters of the target layer network in a first adjusting mode according to the first ranking list.

The first ranking list may be a data list including a target layer network, for example, the first ranking list includes an L1 layer network, an L2 layer network; the first adjustment mode may be any adjustment mode, and it is sufficient to perform precision adjustment on the target layer network, and the embodiment of the present specification is not limited.

For example, the target layer network includes an L1 layer network, an L2 layer network, and an L3 layer network, where the storage pressure of the L1 layer network is 60 bytes, the storage pressure of the L2 layer network is 70 bytes, and the storage pressure of the L3 layer network is 75 bytes, the target layer networks are sorted according to the storage pressures, the first sorting table is the L3 layer network, the L2 layer network, and the L1 layer network, and the accuracy of the target parameters of the L3 layer network, the L2 layer network, and the L1 layer network is adjusted by the first adjustment mode.

In the embodiment of the specification, the target network layers are sorted in a descending order, the target network layer with the highest influence is processed first, the system overhead of the subsequent adjusting step is reduced, and the efficiency is improved.

A specific embodiment of adjusting the accuracy of the target parameters of the L1-layer network, the L2-layer network, and the L3-layer network by the first adjustment method is as follows.

The adjusting the precision of the target parameters of the target layer network by a first adjusting mode according to the first ranking table comprises:

setting the precision of a target layer network in the first ranking list as target precision;

determining a first target layer network in the first ranking list;

obtaining a storage pressure difference value according to the storage pressure of the first target layer network and the storage pressure of the proportional storage space;

performing precision adjustment on target parameters in the first target layer network according to the stored pressure difference value, and deleting the first target layer network in the first ranking list;

and continuing to select the first target layer network in the first ranking list until the first ranking list is empty.

The storage pressure difference may be understood as a difference between a storage pressure of the target layer network and a storage pressure of the proportional storage space, for example: the storage pressure of the target layer network is 60 bytes, the storage pressure of the proportional storage space is 50 bytes, and the storage pressure difference is 10 bytes.

The edge uses the example, the memory space is 100 bytes, the target layer network includes L1 layer network, L2 layer network and L3 layer network, L1 layer network, L2 layer network and L3 layer network's precision is 8bits, wherein the storage pressure of L1 layer network is 60 bytes, the storage pressure of L2 layer network is 70 bytes, the storage pressure of L3 layer network is 75 bytes, then sort according to storage pressure, the first rank list is L3 layer network, L2 layer network and L1 layer network, set L1 layer network, the precision of L2 layer network and L3 layer network to 16bits, then select the first target layer network in the first rank list: the L3 layer network subtracts 50 bytes of storage pressure of a proportional storage space from 75 bytes of storage pressure of the L3 layer network to obtain 25 bytes of storage pressure difference, performs precision adjustment on the L3 layer network according to the 25 bytes of storage pressure difference, and then deletes the L3 layer network from the first ranking list; then, the storage pressure of 70 bytes of the L2 network is selected to subtract 50 bytes of the storage pressure of the proportional storage space to obtain 20 bytes of the storage pressure difference, the precision of the L2 network is adjusted according to the 20 bytes of the storage pressure difference, and then the L2 network is deleted from the first ranking list; then, the storage pressure of the L1 network is selected to be 60 bytes minus the storage pressure of the proportional storage space to be 50 bytes, the storage pressure difference is obtained to be 10 bytes, the precision of the L1 network is adjusted according to the storage pressure difference to be 10 bytes, and then the L1 network is deleted from the first ranking list.

The embodiment of the specification determines the storage pressure difference value first, and adjusts according to the storage pressure difference value, so that the memory space can be utilized to the maximum degree, and meanwhile, the precision is improved to the maximum degree.

Specifically, the performing precision adjustment on the target parameter in the first target layer network according to the stored pressure difference value includes:

determining a current accuracy of a target parameter in the first target layer network;

adjusting the storage pressure difference value according to the current precision of the target parameter, determining the current storage pressure difference value, and determining a target parameter list, wherein the target parameter list comprises the target precision of the current precision of the target parameter;

and under the condition that the current storage pressure difference value is larger than a preset pressure threshold value, performing precision adjustment on the target parameters in the parameter list.

The current precision may be precision before the target parameter, because the target layer network is adjusted cyclically in the above embodiment, when adjusting other target layer networks, precision of the target parameter in the current target layer network may have been adjusted, and therefore, the current precision of the target parameter needs to be determined first; the current storage pressure difference may be a current storage pressure difference during the loop execution; the target parameter list can be parameters to be adjusted selected from parameters of a target layer network; the preset pressure threshold may be a set threshold, e.g., 0 bytes, 10 bytes.

Following the above example, the first destination tier network in the first ranked list is selected: and in the L2 layer network, subtracting 50 bytes of storage pressure of a proportional storage space from 70 bytes of storage pressure of the L2 layer network to obtain 20 bytes of storage pressure difference, and performing precision adjustment on the L2 layer network according to the 20 bytes of storage pressure difference, wherein the L2 layer network comprises a variable A, a variable B and a variable C, the current precision of the variable A is determined to be 8bits, the current precision of the variable B is determined to be 16bits, the current precision of the variable C is determined to be 16bits, the current storage pressure difference is determined according to the 20 bytes of storage pressure difference and the variable A, and the variable B and the variable C are put into a target parameter list. And then, if the current storage pressure difference value is larger than the preset pressure threshold value of 0 byte, carrying out precision adjustment on a variable B and a variable C in the parameter list.

The embodiment of the specification determines the current precision of the target parameter so as to avoid repeated calculation and improve the calculation efficiency.

Specifically, the adjusting the storage pressure difference according to the current precision of the target parameter to determine the current storage pressure difference includes:

judging whether the target parameter has a previous target parameter or not;

if so, subtracting the storage pressure of the target parameter from the storage pressure difference to obtain a current storage pressure difference under the condition that the current precision of the target parameter is inconsistent with the target precision;

if not, under the condition that the current precision of the target parameter is inconsistent with the target precision, subtracting the current storage pressure difference of the previous target parameter from the current storage pressure of the target parameter to obtain an updated current storage pressure difference.

Along the use example, the current precision of the variable A is 8bits, the current precision of the variable B is 16bits, the current precision of the variable C is 16bits, the current storage pressure difference value is determined according to the storage pressure difference value of 20 bytes and the variable A, when the dimensionality of the variable A is 1 dimension and is the first variable of the current cycle, the storage pressure difference value of 20 bytes is subtracted by 1 byte, and the current storage pressure difference value of 19 bytes is obtained.

In another case, if the precision of the next variable D is 8bits, the current storage pressure difference is obtained by subtracting the storage pressure of the variable D from the current storage pressure difference of 19 bytes.

The embodiment of the specification updates the current storage pressure difference value circularly, so that the accuracy of the current storage pressure difference value is improved.

Next, if the current storage pressure difference is greater than the preset pressure threshold of 0 byte, the specific implementation manner of the threshold to perform precision adjustment on the variable B and the variable C in the parameter list is as follows.

When the current storage pressure difference value is greater than a preset pressure threshold value, performing precision adjustment on the target parameters in the parameter list, including:

according to the storage pressure of the target parameters in the parameter list, performing descending sorting on the target parameters in the parameter list to obtain a second sorting list;

and performing precision adjustment on the target parameters in the parameter list in a second adjustment mode according to the second ordered list.

The second ordered list may be a data list including target parameters, for example, the second ordered list includes a variable B and a variable C; the second adjustment mode may be any adjustment mode, and it is sufficient to perform precision adjustment on the target parameter, and the embodiments of the present specification are not limited.

Along with the above example, the target parameter list includes a variable B and a variable C, the storage pressure of the variable B is 10 bytes, and the storage pressure of the variable C is 16 bytes, then the second ordered list is: variable B and variable C. And then, carrying out precision adjustment on the variable B and the variable C in the parameter list in a second adjustment mode.

After the current storage pressure difference is determined to be completed, the embodiment of the specification skips over the target layer network meeting the storage pressure requirement, so that the calculation flow is reduced, and the system overhead is saved.

The specific way of performing precision adjustment on the variables B and C in the parameter list by the second adjustment way is as follows.

The precision adjustment of the target parameters in the parameter list through a second adjustment mode according to the second sorted list includes:

determining a first target parameter in the second sorted list;

adjusting the precision of the first target parameter to an initial precision;

determining the saving storage pressure of the first target layer network according to the initial precision;

deleting a first target parameter in the second sorted list under the condition that the saved storage pressure is smaller than the current storage pressure difference;

and continuing to determine the first target parameter in the second sorted list until the second sorted list is empty.

The initial precision may be a precision when the parameter in the target model is not adjusted, for example, the initial precision is 8 bits; the saving in storage pressure may be a saving in storage pressure adjusted to a low precision, for example, adjusting the precision from 16bits to 8bits may save half of the storage pressure.

If the current storage pressure difference is 10 bytes, continuing with the above example, the second sorted list is: the storage pressure of the variable B is 10 bytes, the storage pressure of the variable C is 16 bytes, the precision of the variable B is adjusted from 16bits to 8bits, the variable B is deleted from the two-rank list, the saved storage pressure is 5 bytes, the saved storage pressure 5 bytes is smaller than the current storage pressure difference by 10 bytes, the variable C is continuously selected, the precision of the variable C is adjusted from 16bits to 8bits, the variable C is deleted from the two-rank list, the saved storage pressure 8 bytes can be obtained, at the moment, the saved storage pressure is 5 bytes plus 8 bytes, the new saved storage pressure is 13 bytes, and the precision adjustment of the current target layer network is completed.

In the embodiment of the specification, the target parameters with unadjusted precision are sorted in a descending order, and the target parameters with the largest influence are adjusted until the storage pressure requirement is met, so that the calculation process is reduced, and the efficiency is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating a processing procedure of a neural network model quantization method according to an embodiment of the present specification, which specifically includes the following steps.

Step 202: and determining initial parameters of the target model, and performing active variable analysis on the initial parameters to obtain the life cycle of the initial parameters.

Wherein, the target model can be a Neural Network model, including but not limited to a fully connected Neural Network (FCN), a Convolutional Neural Network (CNN), a residual error Network (ResNet), and a Feedback Neural Network (Feedback Network); the initial parameters may be parameters of the input object model, such as: inputting a variable A, a variable B and a variable C of a target model; the life cycle may be understood as a period in which variables in the model are used, and may also be referred to as an active interval.

For example, the initial parameters of the target model include variable a, variable B, and variable C, and the target model includes L1 tier network, L2 tier network, and L3 tier network. The variable a, the variable B, and the variable C may obtain an active range (live range) through active variable analysis, specifically, an identity of the variable may be obtained, and a multi-layer network in the target model, where the identity of the variable is identified, is used to query whether the variable exists. And inquiring whether the variable A exists in an L1 layer network, an L2 layer network and an L3 layer network by using the identity of the variable A to obtain whether the variable A exists in the L1 layer network and the L2 layer network, wherein the active interval of the variable A is the L1 layer network and the L2 layer network. Similarly, the active intervals of the variable B obtained in the same query manner are an L2-layer network and an L3-layer network, and the active intervals of the variable C are an L1-layer network, an L2-layer network, and an L3-layer network.

Step 204: and determining the target parameters of the multilayer network of the target model according to the life cycle of the initial parameters.

The target parameter may be a part of the initial parameters, for example: and (5) variable A.

Along with the above example, the active interval of the variable a is an L1 layer network and an L2 layer network, the active interval of the variable B is an L2 layer network and an L3 layer network, and the active interval of the variable C is an L1 layer network, an L2 layer network and an L3 layer network, so the target parameter of the L1 layer network includes the variable a and the variable C, the target parameter of the L2 layer network includes the variable a, the variable B and the variable C, and the target parameter of the L3 layer network is the variable B and the variable C.

Step 206: and determining the storage pressure of the multilayer network of the target model according to the target parameters.

Following the above example, the target model includes an L1 layer network, an L2 layer network, and an L3 layer network, where the L1 layer network includes a variable a and a variable B, the target parameters of the L2 layer network include a variable a, a variable B, and a variable C, and the target parameters of the L3 layer network are a variable B and a variable C. And calculating the storage pressure of an L1 layer network by using the initial precision of 8bits, wherein the storage pressure of a variable A is 30 bytes, the storage pressure of a variable B is 25 bytes, and the storage pressure of a variable C is 40 bytes. Then the storage pressure of the L1 layer network is 55 bytes; the storage pressure of the L2 layer network is 95 bytes; the storage pressure of the L3 layer network is 65 bytes.

Step 208: determining the target precision of the target parameters of the multilayer network of the target model, determining the precision proportion according to the target precision and the precision of the target parameters of the multilayer network, and determining the storage pressure of the proportional storage space of the target object according to the precision proportion.

The target precision can be understood as the precision to be achieved, for example, the current precision is 8bits, and the target precision can be 16 bits; the precision ratio may be a ratio of the current precision to the target precision, for example: the precision ratio is 8bits to 16bits which is 0.5, the proportional storage space may be a space calculated according to the precision ratio, for example, the memory is 100 bytes, the precision ratio is 0.5, and the proportional storage space is 50 bytes; the target parameter may be any parameter in the target layer network of the target model.

For example, the initial precision of the parameters of the target model is 8bits, the target precision of the target parameters of the multilayer network of the target model is set to 16bits, and the precision ratio can be determined to be 0.5. In the case of 100 bytes of memory, it can be determined that the proportional storage space is 50 bytes, and the storage pressure of the proportional storage space is 50 bytes.

Step 210: and determining a network with the storage pressure less than or equal to the storage pressure of the initial storage space of the target object and greater than the storage pressure of the proportional storage space in the target model as a target layer network.

The target layer network can be any layer network in the target model.

Along with the above example, the initial accuracy of the parameters of the target model is 8bits, the target accuracy of the target parameters of the multilayer network of the target model is set to 16bits, the accuracy ratio can be determined to be 0.5, and in the case of 100 bytes of memory, the proportional memory space can be determined to be 50 bytes, the memory pressure of the proportional memory space is 50 bytes, the memory pressure of the L1 layer network is 55 bytes, the memory pressure of the L2 layer network is 95 bytes, and the memory pressure of the L3 layer network is 65 bytes, so that the L1 layer network, the L2 layer network and the L3 layer network are all target layer networks.

Step 212: and performing descending sorting on the target layer network according to the storage pressure of the target layer network to obtain a first sorting list, and setting the precision of the target layer network in the first sorting list as the target precision.

The first ranking list may be a data list including a target layer network, for example, the first ranking list includes an L1 layer network and an L2 layer network.

For example, the target layer network includes an L1 layer network, an L2 layer network, and an L3 layer network, where the storage pressure of the L1 layer network is 55 bytes, the storage pressure of the L2 layer network is 95 bytes, and the storage pressure of the L3 layer network is 65 bytes, and then sorting is performed according to the storage pressure, the first sorting table is the L2 layer network, the L3 layer network, and the L1 layer network, and the precisions of the L1 layer network, the L2 layer network, and the L3 layer network are set to be 16 bits.

Step 214: and determining a first target layer network in the first ranking list, and acquiring a storage pressure difference value according to the storage pressure of the first target layer network and the storage pressure of the proportional storage space.

The storage pressure difference may be understood as a difference between the storage pressure of the target layer network and the storage pressure of the proportional storage space, for example: the storage pressure of the target layer network is 60 bytes, the storage pressure of the proportional storage space is 50 bytes, and the storage pressure difference is 10 bytes.

Following the above example, the first destination tier network in the first ranked list is selected: the L2 network subtracts 50 bytes of storage pressure of proportional storage space from 95 bytes of storage pressure of L2 network to obtain 45 bytes of storage pressure difference.

Step 216: and adjusting the storage pressure difference according to the current precision of the target parameter, determining the current storage pressure difference, and determining a target parameter list, wherein the target parameter list comprises the target precision of the current precision of the target parameter.

The current precision may be precision before the target parameter, because the target layer network is adjusted cyclically in the above embodiment, when adjusting other target layer networks, precision of the target parameter in the current target layer network may have been adjusted, and therefore, the current precision of the target parameter needs to be determined first; the current storage pressure difference may be a current storage pressure difference during the loop execution; the target parameter list may be parameters to be adjusted selected from parameters of the target layer network.

Following the above example, the first destination tier network in the first ranked list is selected: the L2 layer network subtracts 50 bytes of storage pressure of proportional storage space from 95 bytes of storage pressure of the L2 layer network to obtain 45 bytes of storage pressure difference, and adjusts the precision of the L2 layer network according to the 45 bytes of storage pressure difference, wherein the L2 layer network comprises a variable A, a variable B and a variable C, the current precision of the variable A is determined to be 8bits, the current precision of the variable B is determined to be 16bits, the current precision of the variable C is determined to be 16bits, and then the current storage pressure difference is determined according to the 45 bytes of storage pressure difference and 30 bytes of storage pressure of the variable A: 15 bytes, and putting the variable B and the variable C into a target parameter list.

Step 218: and under the condition that the difference value of the current storage pressure is larger than a preset pressure threshold, performing descending sorting on the target parameters in the parameter list according to the storage pressure of the target parameters in the parameter list to obtain a second sorting list.

The preset pressure threshold may be a set threshold, for example, 0 byte, 10 bytes.

Along with the above example, if the current storage pressure difference is greater than the preset pressure threshold value by 0 byte, the target parameter list includes a variable B and a variable C, the storage pressure of the variable B is 25 bytes, and the storage pressure of the variable C is 40 bytes, the second ordered list is: variable C and variable B.

Step 220: determining a first target parameter in the second sorted list; adjusting the precision of the first target parameter to the initial precision; and determining the saving storage pressure of the first target layer network according to the initial precision.

Along with the above example, if the current storage pressure difference is 15 bytes, the second sorted list is: the storage pressure of the variable C and the variable B is 25 bytes, the storage pressure of the variable C is 40 bytes, the precision of the variable B is adjusted from 16bits to 8bits, and the variable B is deleted from the two-row list, so that the storage pressure can be saved by 12.5 bytes.

Step 222: deleting the first target parameter in the second sorted list under the condition that the saved storage pressure is smaller than the current storage pressure difference; and continuing to determine the first target parameter in the second sorted list until the second sorted list is empty, and deleting the first target layer network in the first sorted list.

Along with the use of the above example, if the saved storage pressure 12.5 bytes is less than the current storage pressure difference of 15 bytes, the variable C is continuously selected, the precision of the variable C is adjusted from 16bits to 8bits, and the variable C is deleted from the two-rank list, so that the saved storage pressure is 20 bytes, at this time, the new saved storage pressure is 12.5 bytes plus 20 bytes, the new saved storage pressure is 32.5 bytes, and the second rank list is empty, so that the precision adjustment of the L2 layer network is completed, and the L2 layer network is deleted from the first rank list.

Step 224: and judging whether the first sorting list is empty or not.

Following the above example, in the case where the first sorted list is not empty, steps 216 to 222 are continuously performed to perform precision adjustment on the L1 level network and the L3 level network. In case the first sorted list is empty, the adjustment is completed.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a neural network model quantization apparatus, and fig. 3 shows a schematic structural diagram of a neural network model quantization apparatus provided in an embodiment of the present specification. As shown in fig. 3, the apparatus includes:

a parameter determination module 302 configured to determine target parameters of a multi-layer network of a neural network-based target model according to parameter attributes of initial parameters of the target model;

a pressure determination module 304 configured to determine a storage pressure of the multi-layer network of the target model according to the target parameter;

an adjusting module 306 configured to adjust a target parameter of a target layer network of the target model according to a storage pressure of the multilayer network.

Optionally, the parameter determining module 302 is further configured to:

determining initial parameters of a target model;

Optionally, the pressure determination module 304 is further configured to:

Optionally, the adjusting module 306 is further configured to:

determining a first target layer network in the first ranking list;

according to the stored pressure difference value, carrying out precision adjustment on target parameters in the first target layer network, and deleting the first target layer network in the first ranking list;

Optionally, the adjusting module 306 is further configured to:

judging whether the target parameter has a previous target parameter or not;

Optionally, the adjusting module 306 is further configured to:

according to the storage pressure of the target parameters in the parameter list, performing descending sorting on the target parameters in the parameter list to obtain a second sorted list;

and performing precision adjustment on the target parameters in the parameter list through a second adjustment mode according to the second sorted list.

Optionally, the adjusting module 306 is further configured to:

determining a first target parameter in the second sorted list;

adjusting the precision of the first target parameter to an initial precision;

The embodiment of the specification provides a neural network model quantizing device, which determines target parameters of a multilayer network of a target model according to parameter attributes of initial parameters of the target model, determines storage pressure of the multilayer network of the target model according to the target parameters, and adjusts the target parameters of a target layer network of the target model according to the storage pressure of the multilayer network. The storage pressure of the multilayer network of the target model is determined through the target parameters, the target layer network and the target parameters of the accuracy to be adjusted are determined according to the storage pressure of the multilayer network, different accuracy adjustments are further performed on different target parameters, and the automation degree and the efficiency are improved.

The above is a schematic scheme of a neural network model quantization apparatus of this embodiment. It should be noted that the technical solution of the neural network model quantization apparatus and the technical solution of the neural network model quantization method belong to the same concept, and details of the technical solution of the neural network model quantization apparatus, which are not described in detail, can be referred to the description of the technical solution of the neural network model quantization method.

FIG. 4 illustrates a block diagram of a computing device 400 provided in accordance with one embodiment of the present description. The components of the computing device 400 include, but are not limited to, a memory 410 and a processor 420. Processor 420 is coupled to memory 410 via bus 430 and database 450 is used to store data.

Computing device 400 also includes access device 440, access device 440 enabling computing device 400 to communicate via one or more networks 460. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 440 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 400, as well as other components not shown in FIG. 4, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 4 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 400 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 400 may also be a mobile or stationary server.

Wherein the processor 420 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the neural network model quantification method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the neural network model quantization method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the neural network model quantization method.

An embodiment of the present specification also provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the steps of the neural network model quantization method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the neural network model quantization method described above, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the neural network model quantization method described above.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the neural network model quantization method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the neural network model quantization method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the neural network model quantization method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Furthermore, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required in the implementations of the disclosure.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A neural network model quantification method, comprising:

2. The method of claim 1, the determining target parameters of a multi-layer network of a neural network-based target model according to parameter properties of initial parameters of the target model, comprising:

determining initial parameters of a target model;

3. The method of claim 1, the determining a storage pressure of a multi-layer network of the target model from the target parameters, comprising:

4. The method of claim 1, the adjusting target parameters of a target tier network of the target model according to storage pressure of the multi-tier network, comprising:

5. The method of claim 4, wherein adjusting the target parameters of the target layer network of the target model according to the storage pressure of the multi-layer network and the storage pressure of the proportional storage space comprises:

determining a network with a storage pressure less than or equal to that of an initial storage space of the target object and greater than that of the proportional storage space in the target model as a target layer network;

6. The method of claim 5, the adjusting the accuracy of the target parameters of the target tier network according to storage pressure of the target tier network, comprising:

7. The method of claim 6, wherein the adjusting the accuracy of the target parameters of the target layer network by a first adjustment according to the first sorted list comprises:

determining a first target layer network in the first ranking list;

8. The method of claim 7, wherein said accurately adjusting a target parameter in said first target tier network based on said stored pressure differential value comprises:

9. The method of claim 8, said adjusting said storage pressure differential value according to a current accuracy of said target parameter, determining a current storage pressure differential value, comprising:

judging whether the target parameter has a previous target parameter or not;

10. The method of claim 8, wherein the performing precision adjustment on the target parameter in the parameter list if the current storage pressure difference is greater than a preset pressure threshold comprises:

11. The method of claim 10, wherein the performing precision adjustment on the target parameters in the parameter list according to the second sorted list by a second adjustment manner comprises:

determining a first target parameter in the second sorted list;

adjusting the precision of the first target parameter to an initial precision;

12. An apparatus for neural network model quantization, comprising:

13. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the steps of the neural network model quantification method of any one of claims 1 to 11.

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the neural network model quantification method of any one of claims 1 to 11.