CN114049530A

CN114049530A - Hybrid precision neural network quantization method, device and equipment

Info

Publication number: CN114049530A
Application number: CN202111221339.0A
Authority: CN
Inventors: 程文华; 王留锋; 吕倪祺; 方民权; 游亮; 龙欣
Original assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Current assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-02-15

Abstract

The application discloses a hybrid precision neural network quantification method, device and equipment. The method determines a mixed precision quantization strategy by combining reinforcement learning and verification data generation technologies, so that the distribution difference between a mixed precision network and the output of a network to be quantized is determined based on an automatically generated verification data set, and the distribution difference is used as a standard for judging the effect of the quantization strategy. By adopting the processing mode, a user does not need to provide verification data for verifying the accuracy of the quantized network, and the time consumption for verifying the accuracy of the quantized network is reduced; the quantized network determined by reinforcement learning can be obtained by using smaller precision loss instead of larger calculation amount and reduced amplitude without fine tuning of the quantized network by a user, the network quantization process is simplified, and the noninductive model quantization strategy search of the user is realized; therefore, on the premise of ensuring that the precision loss of the model is small, the model quantization efficiency can be effectively improved, the data privacy of a user is protected, and the labor cost is reduced.

Description

Hybrid precision neural network quantization method, device and equipment

Technical Field

The application relates to the technical field of machine learning, in particular to a hybrid precision neural network quantification method, a device and a system, and electronic equipment.

Background

With the development of deep learning, machine learning models based on neural networks are widely applied to various fields, and huge parameters and calculated quantities are introduced while model performance is improved. The mixed precision quantization is to quantize the weight of each layer of the neural network into different precisions, for example, the floating point calculation of part of network layers is converted into low-ratio specific point calculation, and other network layers still acquire the floating point calculation, so that a model with a smaller volume is obtained.

The mixed precision quantification can effectively reduce the calculation intensity of the model, the size of parameters and the memory consumption, but usually brings larger precision loss. Therefore, how to search the mixed precision quantization strategy to achieve a better compromise between the network computation amount and the accuracy rate is widely researched. A typical mixed-precision quantization strategy search method is to search for a quantization strategy of each layer of the network, and fine-tune the quantization network after each search to more accurately evaluate the quantization strategy. And after the whole strategy search is finished, performing fine tuning training again. The method is to perform accuracy rate checks on various possible quantized networks based on a validation data set provided by a user to select a quantized network that achieves a better compromise between network computation and accuracy rate.

However, in the process of implementing the invention, the inventor finds that the existing scheme has at least the following problems: 1) the fine tuning training can lead to complex network quantization flow, such as the need to determine the hyper-parameters of the training; 2) reasoning about a large number of verification sets provided by users in order to obtain the accuracy of a quantified network is very time-consuming; 3) quantitative calibration and verification require data provided by a user, and various problems such as privacy and the like may be caused. In summary, how to implement the model quantization without perception for the user, quickly determine the mixed precision network with higher precision and protect the privacy data sensitive to the user becomes a problem that developers in the field need to solve urgently.

Disclosure of Invention

The application provides a mixed precision neural network quantification method, which aims to solve the problems that in the prior art, model quantification has perception on a user, quantification efficiency is low, model precision loss is large, and user data privacy cannot be protected. The application further provides a mixed precision neural network quantization device, a mixed precision neural network quantization system and electronic equipment.

The application provides a mixed precision neural network quantification method, which comprises the following steps:

acquiring a network to be quantized;

generating a verification data set according to a network to be quantized;

and determining a target mixed precision network of the network to be quantified according to the verification data set through a reinforcement learning algorithm.

Optionally, the determining, by using a reinforcement learning algorithm, a target mixed precision network of a network to be quantized according to the verification data set includes:

determining a plurality of mixed precision networks of the network to be quantized according to the resource usage threshold;

determining loss data of the mixed precision network relative to a network to be quantized according to the verification data set;

and determining a target mixed precision network from a plurality of mixed precision networks according to the loss data.

Optionally, the determining, according to the verification data set, loss data of the mixed precision network with respect to the to-be-quantized network includes:

determining first output data of a network to be quantized and second output data of the mixed precision network according to the verification data set;

and determining the loss data according to the first output data and the second output data.

Optionally, the determining a target mixed precision network from a plurality of mixed precision networks according to the loss data includes:

and determining a target mixed precision network from the plurality of mixed precision networks according to the index function value by taking the average value of the loss data of each point as the index value of the index function.

and determining a target mixed precision network from a plurality of mixed precision networks according to the loss data and the resource usage amount of the mixed precision network.

Optionally, the determining, by using a reinforcement learning algorithm, a target mixed precision network of a network to be quantized according to the verification data set further includes:

and calibrating the mixed precision network according to the verification data set.

Optionally, the determining, according to the resource usage threshold, a plurality of mixed precision networks of the network to be quantized includes:

and if the resource usage of the mixed precision network is larger than the resource usage threshold, reducing the precision of a part of layers of the mixed precision network.

Optionally, the reducing the precision of the partial layer of the hybrid precision network includes:

in a random manner, the partial layer is determined.

Optionally, the observation features of the reinforcement learning algorithm include: the calculation precision bit number of a plurality of network layers of the mixed precision network, the calculation amount of the plurality of network layers and the current quantized network layer identification.

Optionally, the method further includes:

determining a resource usage threshold of the network to be quantized according to the device resource data of the target device of the network to be quantized;

determining the target mixed precision network according to the verification data set and the resource usage threshold value through a reinforcement learning algorithm;

deploying a target hybrid precision network to the target device.

The present application further provides a mixed precision neural network quantization apparatus, including:

the network acquisition unit is used for acquiring a network to be quantized;

the data generation unit is used for generating a verification data set according to the network to be quantized;

and the reinforcement learning unit is used for determining a target mixed precision network of the network to be quantified according to the verification data set through a reinforcement learning algorithm.

Optionally, the reinforcement learning unit includes:

the network quantization subunit is used for determining a plurality of mixed precision networks of the network to be quantized according to the resource usage threshold;

the loss determining subunit is used for determining loss data of the mixed precision network relative to the network to be quantized according to the verification data set;

and the network selection subunit is used for determining a target mixed precision network from the plurality of mixed precision networks according to the loss data.

The present application further provides an electronic device, comprising:

a processor; and

and the memory is used for storing a program for realizing the hybrid precision neural network quantization method, and the terminal is powered on and runs the program of the method through the processor.

The present application further provides a hybrid precision neural network processing system, comprising:

the device comprises a network construction device, the mixed precision neural network quantification device and a network deployment device.

The present application further provides an electronic device, comprising:

a processor and a memory; a memory for storing a program for implementing the above method, the device being powered on and the program for the method being run by the processor.

The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the various methods described above.

The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.

Compared with the prior art, the method has the following advantages:

according to the mixed precision neural network quantification method provided by the embodiment of the application, the mixed precision quantification strategy is determined by combining reinforcement learning and verification data generation technologies, so that the distribution difference between the output of the mixed precision network and the output of the to-be-quantified network is determined based on an automatically generated verification data set, and the distribution difference is used as a standard for judging the effect of the quantification strategy. By adopting the processing mode, a user does not need to provide verification data for verifying the accuracy of the quantized network, and the time consumption for verifying the accuracy of the quantized network is reduced; the quantized network determined by reinforcement learning can be obtained by using smaller precision loss instead of larger calculation amount and reduced amplitude without fine tuning of the quantized network by a user, the network quantization process is simplified, and the noninductive model quantization strategy search of the user is realized; therefore, on the premise of ensuring that the precision loss of the model is small, the model quantization efficiency can be effectively improved, the data privacy of a user is protected, and the labor cost is reduced. In addition, the processing mode also enables the converted network layer adopting fixed point calculation to really realize low bit convolution calculation, and can obtain larger calculation amount and reduce amplitude by using smaller precision loss, thereby effectively reducing the precision loss of the model.

Drawings

Fig. 1 is a schematic view of an application scenario of an embodiment of a network quantization method provided in the present application;

FIG. 2 is a schematic flow chart diagram illustrating an embodiment of a network quantization method provided herein;

fig. 3 is a schematic flow chart of an embodiment of a network quantization method provided in the present application;

FIG. 4 is a schematic diagram of different precision data mapping for an embodiment of a network quantization method provided by the present application;

fig. 5 is a schematic diagram of a reinforcement learning process of an embodiment of a network quantization method provided in the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

In the application, a hybrid precision neural network quantization method, a device and a system, and an electronic device are provided. Each of the schemes is described in detail in the following examples.

Please refer to fig. 1, which is a schematic view of an application scenario of an embodiment of a hybrid precision neural network quantization method provided in the present application. In this embodiment, a terminal device (e.g., a smart phone, a smart television, a smart air conditioner, etc.) may execute model prediction processing through a locally deployed neural network model, e.g., the smart phone determines whether there is a collision action according to state data of a motion detection device (e.g., a gyroscope, etc.) of the mobile phone through a locally deployed 'collision action' recognition model, and if it is determined that there is a collision action, may execute application processing related to the collision action, e.g., add a friend in an instant messaging application, etc. Due to the limited computational resources of the terminal devices, hybrid precision networks are typically deployed. Meanwhile, with the continuous development of the technology of the image processor GPU chip, the computing speed of low-ratio specific-point computation (such as Int8 and Int4) is multiplied compared with floating-point computation (such as FP32 and FP16), so that fixed-point computation is adopted as much as possible on the terminal device, and great help is provided for the size and speed of the model.

In practical application, a user of a network construction party can train through a network construction server to obtain a prediction model based on a neural network, and the model is submitted to a network quantization processing department as a network to be quantized; then, the user of the network quantizer quantizes the original network to be quantized through the network quantization server. Finally, the quantization network can be deployed to the terminal device side, and prediction processing is performed on the terminal device through the quantization network.

It should be noted that the method provided in the embodiment of the present application is not limited to the application scenario shown in fig. 1, and the application scenario shown in fig. 1 is only an optional application scenario. In specific implementation, the network builder and the network quantizer may be designed separately as shown in fig. 1, or may be integrated, that is, the network builder is responsible for model quantization processing. In addition, the quantization network may be deployed to a terminal device side, or may be deployed to a device side such as a server.

First embodiment

Please refer to fig. 2, which is a flowchart illustrating an embodiment of a hybrid precision neural network quantization method according to the present application. In this embodiment, the method may include the steps of:

step S201: and acquiring the network to be quantized.

The network to be quantified refers to a machine learning model based on a neural network, such as an object detection model, a collision-collision action recognition model, a voice recognition model and the like. The parameter quantity of the network to be quantified is usually large, taking the commonly used AlexNet as an example, the parameter quantity of each network layer is shown in the following table:

as can be seen from the table, the total number of parameters for AlexNet: 62369155, parameter memory consumption: 238 MB. For another example, in a VGG network structure with 16 layers, the total amount of parameters is as follows: 138357544138348355, parameter memory consumption: 528 MB.

Each layer of parameters of the network to be quantized can be calculated by floating point (such as FP32 or FP16), and because the quantity of parameters is huge, the amount of calculation consumed by the operation performed by the network to be quantized is also huge. In order to obtain a model with a smaller volume, the method provided by the embodiment of the application can be adopted to quantize the parameters (weights) of each layer of the neural network into different precisions, for example, floating point calculation of part of network layers is converted into low specific point calculation, and other network layers still acquire floating point calculation, so that a larger calculation amount is reduced by using smaller precision loss.

Step S203: and generating a verification data set according to the network to be quantized.

In the network quantization process, the precision of each network layer needs to be determined, and the accuracy of the quantized network needs to be checked according to the verification data set. According to the method provided by the embodiment of the application, the verification data set is generated according to the network to be quantized and is used for checking the accuracy rate of the quantized network.

The parameters of the network to be quantified are obtained by learning from a training data set, and the verification data automatically generated based on the network to be quantified is data similar to certain characteristics of the training data. In this embodiment, the verification data set may be generated in an auto-supervised manner by using the characteristics of the network BN (Batch Normalization).

The machine learning field assumes that training data, verification data and test data satisfy the same distribution, which is a basic guarantee that a model obtained through the training data can obtain good effect in a test set. The BatchNorm is an algorithm for keeping the same distribution of the input of each layer of neural network in the deep neural network training process to accelerate the neural network training and the convergence speed and stability.

In one example, the input data of the network to be quantified is image data, and the verification data set automatically generated based on the network to be quantified is a verification image set. In specific implementation, a ZeroQ image generation technology may be adopted, and the verification data is generated according to the difference between the mean and the variance of the network to be quantized by using the characteristics of the verification data in the batchnorm layer. Since the generation of the verification data set according to the network to be quantified belongs to the mature prior art, the detailed description is omitted here.

It should be noted that the verification data automatically generated based on the network to be quantified is different from the verification data provided by the user, and can be embodied in two aspects: 1) the verification data provided by the user comprises the privacy data, and the verification data automatically generated based on the network to be quantified does not comprise the privacy data of the user. The verification data provided by the user is usually a part of data divided from the training data set, the part of data is not used for training, and the main function of the verification data is to detect the effect of model training and detect the accuracy of the currently trained model in the verification data set at intervals so as to prevent the model from being under-trained or over-trained. 2) Each verification data provided by the user has fewer features, and the accuracy of the quantized network needs to be verified on a larger verification data set (e.g., thousands of pictures), which is more time-consuming due to multiple verifications. And each verification data automatically generated based on the network to be quantized comprises more features (for example, 1 picture comprises thousands of features), and the accuracy rate of the quantized network can be verified on a small amount of verification data (for example, 64 pictures) to obtain a good effect, so that the quantization strategy searching process under the condition of feature matching can be accelerated.

In summary, the method provided by the embodiment of the present application determines the distribution gap between the output of the hybrid precision network and the output of the network to be quantized based on the automatically generated verification data set, and uses the distribution gap as a standard for evaluating the effect of the quantization strategy. By adopting the processing mode, a user does not need to provide verification data for verifying the accuracy of the quantized network, the time consumption for verifying the accuracy of the quantized network is reduced, and the noninductive model quantization strategy search of the user is realized; therefore, the model quantization efficiency can be effectively improved, and the data privacy of the user can be protected.

Step S205: and determining a target mixed precision network of the network to be quantified according to the verification data set through a reinforcement learning algorithm.

The method provided by the embodiment of the application can utilize a Reinforcement Learning (RL) agent to search the quantization strategy of each network layer in the network to be quantized, and check the accuracy of a mixed precision network (quantization network for short) according to the verification data set so as to obtain a target mixed precision network with higher accuracy.

The target mixed precision network can be a neural network with mixed precision. In the neural network with mixed precision, the data type of the parameters of a part of network layers is a floating point type, and floating point operation is adopted; the data type of the parameters of a part of network layers is integer, and fixed-point operation is adopted. The network layer with the floating point precision can include a network layer with FP32 precision, a network layer with FP16 precision and the like. Network layers with fixed point accuracy may include a network layer with Int8 accuracy, a network layer with Int4 accuracy, and a network layer with Int2 accuracy.

The target hybrid precision network meets the calculation quantity requirement, namely the resource usage amount of the target hybrid precision network is less than or equal to the resource usage amount threshold value. In this embodiment, the target hybrid precision network is determined according to the verification data set and the resource usage threshold by a reinforcement learning algorithm.

The resource usage threshold is an upper limit value of the target hybrid-precision network consumable device resource, and at least includes a computing resource threshold and may further include a storage resource threshold. The resource usage threshold may be determined according to application requirements. In particular, the resource usage threshold may be determined manually and empirically.

In one example, the method may further comprise the steps of: and determining the resource usage threshold of the network to be quantized according to the device resource data of the target device of the network to be quantized. In this way, after the target mixed precision network is determined, the target mixed precision network is deployed to the target equipment side for operation. The target device includes but is not limited to a mobile terminal, such as a smart phone, a tablet computer, and the like, and may further include a server, a personal computer, a smart television, a smart speaker, a smart refrigerator, an unmanned automobile, and the like.

Reinforcement learning, also known as refinish learning, evaluative learning, or reinforcement learning, is used to describe and solve the problem of agents (agents) learning strategies to maximize returns or achieve specific goals during interactions with the environment. Intuitively, the problem to be solved by reinforcement learning is: let agent learn how to act in an environment to obtain the maximum reward value sum (total rewarded). This reward value is associated with the agent defined task goal. The main learning content required by agent includes action policy. The learning goal of the behavior strategy is to optimize the strategy, that is, by using the strategy, the agent can obtain the maximum reward value for the behavior in a specific environment, so as to achieve the task goal.

The method provided by the embodiment of the application applies reinforcement learning in the quantization scene of the mixed precision neural network, namely, the agent learns the quantization strategy (namely, action act) of each network layer in the network to be quantized, so that the mixed precision network with higher accuracy is obtained. In specific implementation, a Deep Deterministic Policy Gradient (DDPG) or a double Delayed DDPG (TD 3) may be used as the reinforcement learning Agent.

As shown in fig. 5, in the present embodiment, TD3 is used as agent, and TD3 learns both the policy network and the value network (there may be a pair of twins value networks), and is suitable for the continuous motion space. The value network Q (s, a) mainly models the value of taking some action a under some state (also called observation) s, which is trained by the self-consistency of bellman's equation. The policy network pl(s) mainly needs to model what action a needs to be predicted to obtain the maximum reward in a certain state s. If a value network Q exists, its goal is readily available. Experiments show that the robustness and convergence of TD3 are greatly improved compared with DDPG.

In this embodiment, the reward function (return function) of the reinforcement learning algorithm may be determined according to loss data of the hybrid precision network relative to the network to be quantized, so as to measure the precision loss of the hybrid precision network. In specific implementation, the verification data may be used as input data of the mixed precision network and the network to be quantized, respectively, and a difference between output data of the two networks may be used as the loss data. In this case, step S205 may include the following sub-steps:

step S2051: and determining a plurality of mixed precision networks of the network to be quantized according to the resource usage threshold.

The reinforcement learning agent can generate a plurality of mixed precision networks meeting the resource usage requirement through the policy network. As shown in fig. 5, one mixed-precision network is generated as follows: the first layer adopts floating point operation (FP), the Nth layer adopts fixed point operation (8bit), and the last layer adopts floating point operation (FP).

In this embodiment, step S2051 may be implemented as follows: and if the resource usage of the mixed precision network is larger than the resource usage threshold, reducing the precision of a part of layers of the mixed precision network. The task of quantization strategy search is to search the quantization strategy under the given computation power (flops) limit, and the computation power used by the quantized network at the time of operation cannot exceed the given computation power limit. If the path is explored at a certain time so that the computation power of the quantization network exceeds the limit, part of layers of the quantization network can be converted into low-bit computation, for example, randomly converting some layers from floating-point computation to fixed-point (for example, 8-bit) computation quantization. By adopting the processing mode, the quantitative network can be ensured to meet the requirement of resource usage. The computational power usage of the neural network can be calculated by adopting a mature prior art, and is not described herein again.

In one example, the observed features of the reinforcement learning algorithm may include: the calculation precision bit number of a plurality of network layers of the mixed precision network, the calculation amount of the plurality of network layers and the current quantized network layer identification. The method is a global considerable observation space, and the observation characteristics of each network layer are the quantized bit number (int2, 4, 8, fp16, fp32 and the like) of the layer and the calculated amount (flops) of the layer, and are 2-dimensional characteristics. If the network has N layers to be quantized, the observation space is the superposition of the N2-dimensional features. In addition, there is also a dimension that is the policy of which layer is being set. The global observation feature input can enable the reinforcement learning agent to acquire the current quantitative information of the whole network, and is more beneficial to modeling of the interlayer relation and improving the robustness compared with the local observation feature.

Step S2053: and determining loss data of the mixed precision network relative to the network to be quantized according to the verification data set.

In the embodiment, according to the verification data, determining first output data of a network to be quantized and second output data of the mixed precision network; and determining loss data corresponding to the verification data according to the first output data and the second output data. And taking the verification data as input data of the network to be quantized and the mixed precision network, and obtaining first output data and second output data through the network to be quantized and the mixed precision network. The verification data set comprises a plurality of verification data, and loss data respectively corresponding to the verification data form data distribution loss. The present embodiment uses the distribution difference between two network outputs as the criterion for evaluating the effect of the quantization strategy.

In one example, the mixed-precision network determined by the policy network is an uncalibrated quantized network. In this case, step S205 may further include the following sub-steps: and calibrating the mixed precision network according to the verification data set.

As shown in fig. 4, taking one network layer (such as a convolutional layer or a fully-connected layer) in the network to be quantized as an example, there are generally two places where quantization is required, namely, the network input and the network parameters, which are taken as an example. If the layer is quantized to fixed-point calculation, if the data type of the layer network parameter is Int8, the value range of Int8 fixed-point data is (-127, 127). It can be seen that a linear mapping is performed from the network to be quantized to the quantization network, which can be expressed by the formula y scale x + bias. Wherein x represents a floating-point parameter value, y represents a fixed-point parameter value, scale represents a scaling ratio of the data mapping (mapping ratio for short), and bias represents an offset of the data mapping (mapping offset for short). The process of calibrating the mixed precision network is to determine the mapping scale and the mapping offset bias.

In specific implementation, the method for calibrating the hybrid precision network according to the verification data set comprises the following steps: and determining the mapping proportion and the mapping offset according to the output data of the uncalibrated mixed precision network by taking the verification data as the input data of the uncalibrated mixed precision network.

According to the method provided by the embodiment, the mixed precision network is calibrated according to the verification data, and a calibrated quantitative network is obtained. And inputting the verification data into the calibrated quantization network to obtain second output data.

Step S2055: and determining a target mixed precision network from a plurality of mixed precision networks according to the loss data.

When the quality of the quantization strategy is evaluated through the reinforcement learning algorithm, a reward function can be adopted for evaluation, for example, the quality of the quantization network is directly described by the mean value of loss data corresponding to each verification data.

In one example, step S2055 may be implemented as follows: and determining a target mixed precision network from the plurality of mixed precision networks according to the index function value by taking the average value of the loss data of each point as the index value of the index function. In specific implementation, the value of the hybrid precision network can be determined according to the exponential function value; and determining a target mixed precision network according to the value. Therefore, by introducing the nonlinear change index function, the reward function is changed into a convex function, such as e ^ abs (), and the change of the network accuracy can be more matched and quantized.

In one example, step S2055 may be implemented as follows: and determining a target mixed precision network from a plurality of mixed precision networks according to the loss data and the resource usage amount of the mixed precision network. In specific implementation, the value of the hybrid precision network can be determined according to the loss data and the resource usage amount of the hybrid precision network; and determining a target mixed precision network according to the value. In specific implementation, the weight of the loss data and the weight of the resource usage amount can be set, and the precision loss and the resource usage amount of the quantization network are comprehensively considered, so that a target quantization network meeting the requirements of users is obtained.

Experiments prove that the method provided by the embodiment of the application can be used for converging the lightweight convolutional neural network with the search space of about 2^53 times only by searching for hundreds of times to determine the target quantization network.

As can be seen from the foregoing embodiments, in the mixed precision neural network quantization method provided in the embodiments of the present application, a mixed precision quantization strategy is determined by combining reinforcement learning and verification data generation techniques, so that a distribution gap between a mixed precision network and a network to be quantized is determined based on an automatically generated verification data set, and this is used as a standard for evaluating the effect of the quantization strategy. By adopting the processing mode, a user does not need to provide verification data for verifying the accuracy of the quantized network, and the time consumption for verifying the accuracy of the quantized network is reduced; the quantized network determined by reinforcement learning can be obtained by using smaller precision loss instead of larger calculation amount and reduced amplitude without fine tuning of the quantized network by a user, the network quantization process is simplified, and the noninductive model quantization strategy search of the user is realized; therefore, on the premise of ensuring that the precision loss of the model is small, the model quantization efficiency can be effectively improved, the data privacy of a user is protected, and the labor cost is reduced. In addition, the processing mode also enables the converted network layer adopting fixed point calculation to really realize low bit convolution calculation, and can obtain larger calculation amount and reduce amplitude by using smaller precision loss, thereby effectively reducing the precision loss of the model.

Second embodiment

In the foregoing embodiment, a mixed-precision neural network quantization method is provided, and correspondingly, the present application also provides a mixed-precision neural network quantization apparatus. The apparatus corresponds to an embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The present application additionally provides a hybrid precision neural network quantization apparatus, comprising:

the network acquisition unit is used for acquiring a network to be quantized;

Optionally, the reinforcement learning unit includes:

Optionally, the loss determining subunit includes:

the prediction subunit is used for determining first output data of a network to be quantized and second output data of the mixed precision network according to the verification data set;

and the calculating subunit is used for determining the loss data according to the first output data and the second output data.

Optionally, the network selection subunit is specifically configured to determine the target mixed precision network from the multiple mixed precision networks according to the index function value by using the average value of the loss data of each point as an index value of the index function.

Optionally, the network selecting subunit is specifically configured to determine a target mixed precision network from multiple mixed precision networks according to the loss data and the resource usage amount of the mixed precision network.

Optionally, the reinforcement learning unit further includes:

and the calibration subunit is used for performing calibration processing on the mixed precision network according to the verification data set.

Optionally, the network quantization subunit is specifically configured to, if the resource usage amount of the hybrid precision network is greater than a resource usage amount threshold, reduce the precision of a partial layer of the hybrid precision network.

Optionally, the network quantization subunit is specifically configured to determine the partial layer in a random manner if the resource usage amount of the hybrid precision network is greater than a resource usage amount threshold.

Optionally, the method further includes:

the threshold value determining unit is used for determining the resource usage threshold value of the network to be quantized according to the equipment resource data of the target equipment of the network to be quantized;

the reinforcement learning unit is specifically configured to determine the target hybrid precision network according to the verification data set and the resource usage threshold by using a reinforcement learning algorithm;

and the network deployment unit is used for deploying the target mixed precision network to the target equipment.

Third embodiment

In the foregoing embodiment, a mixed-precision neural network quantization method is provided, and accordingly, the present application also provides an electronic device. The apparatus corresponds to an embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The present application additionally provides an electronic device comprising: a processor and a memory. The memory is used for storing a program for realizing the mixed precision neural network quantization method, and the terminal is powered on and runs the program of the method through the processor.

Fourth embodiment

In the foregoing embodiment, a mixed-precision neural network quantization method is provided, and correspondingly, the present application also provides a mixed-precision neural network quantization system. The system corresponds to the embodiment of the method described above. Since the system embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The system embodiments described below are merely illustrative.

The present application additionally provides a hybrid precision neural network quantization system, comprising: a network construction device, the mixed precision neural network quantification device and the network deployment device in the above embodiments.

The network construction device is used for learning to obtain a machine learning model based on a neural network; and the network deployment device is used for deploying the target mixed precision neural network obtained by the mixed precision neural network quantification device to the equipment side for operation.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A mixed precision neural network quantization method, comprising:

acquiring a network to be quantized;

generating a verification data set according to a network to be quantized;

2. The method according to claim 1,

the determining a target mixed precision network of the network to be quantified according to the verification data set by a reinforcement learning algorithm comprises the following steps:

3. The method of claim 2, wherein determining loss data of the mixed-precision network relative to a to-be-quantized network from the validation dataset comprises:

4. The method of claim 2, wherein determining a target mixed-precision network from a plurality of mixed-precision networks based on the loss data comprises:

5. The method of claim 2, wherein determining a target mixed-precision network from a plurality of mixed-precision networks based on the loss data comprises:

6. The method according to claim 2,

the determining a target mixed precision network of the network to be quantified according to the verification data set by a reinforcement learning algorithm further comprises:

7. The method according to claim 2,

the determining a plurality of mixed precision networks of the network to be quantized according to the resource usage threshold value comprises:

8. The method of claim 7, wherein said reducing the precision of the partial layers of the hybrid precision network comprises:

in a random manner, the partial layer is determined.

9. The method according to claim 1,

the observed features of the reinforcement learning algorithm include: the calculation precision bit number of a plurality of network layers of the mixed precision network, the calculation amount of the plurality of network layers and the current quantized network layer identification.

10. The method of claim 1, further comprising:

deploying a target hybrid precision network to the target device.

11. A hybrid precision neural network quantization apparatus, comprising:

the network acquisition unit is used for acquiring a network to be quantized;

12. The apparatus of claim 10,

the reinforcement learning unit includes:

13. An electronic device, comprising:

a processor; and

a memory for storing a program for implementing the hybrid precision neural network quantization method according to any one of claims 1-10, the terminal being powered on and the program for executing the method by the processor.

14. A hybrid precision neural network processing system, comprising:

the apparatus for constructing a network, the apparatus for quantizing a hybrid precision neural network according to claim 11, and the apparatus for deploying a network.