CN113449854A

CN113449854A - Method and device for quantifying mixing precision of network model and computer storage medium

Info

Publication number: CN113449854A
Application number: CN202111000718.7A
Authority: CN
Inventors: 程文华; 康瑞鹏; 吕倪祺; 方民权; 游亮; 龙欣
Original assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Current assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-09-28

Abstract

The embodiment of the application provides a method and equipment for quantifying mixing precision of a network model and a computer storage medium. The method comprises the following steps: the method comprises the steps that a network model and a configured network model corresponding to the network model are obtained, data of network layers in the network model are of a first data type, data of at least one network layer in the configured network model are of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using a network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined. According to the technical scheme, the quantitative operation of the mixing precision of the model can be realized without providing any data by a user.

Description

Method and device for quantifying mixing precision of network model and computer storage medium

Technical Field

The present application relates to the field of network model technologies, and in particular, to a method and an apparatus for quantizing hybrid precision of a network model, and a computer storage medium.

Background

With the rapid development of graphics processors GPUs, the speed and efficiency of low-ratio, specific-point computations compared to floating-point computations can be multiplied, for example: the computation speed of the integer int8 and int4 can be multiplied compared with the floating point number FP32 and the floating point number FP 16. Since the fixed-point calculation operation on the mobile terminal greatly helps the size of the network model and the data processing speed, a common conversion idea for the network model is to convert floating-point numbers into integers for calculation. However, common conversion methods such as quantization perception training, post-quantization, mixed quantization search, and the like often require a user to provide labeled data, which easily causes the user to perceive the whole quantization operation, and the labeled data has a great degree of differentiation, so that the precision loss of the network model is large.

Disclosure of Invention

The embodiment of the application provides a method and equipment for quantizing the mixing precision of a network model and a computer storage medium, and the automatic operation of quantizing the mixing precision of the network model can be carried out without providing marking data by a user, so that the accuracy of the method for quantizing the mixing precision can be ensured, and the data processing performance and efficiency of the network model can be improved.

In a first aspect, an embodiment of the present application provides a method for quantizing mixing precision of a network model, including:

acquiring a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type;

generating calibration data corresponding to the network model;

processing the calibration data by using the network model to obtain a first processing result;

processing the calibration data by using the configured network model to obtain a second processing result;

determining a sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result.

In a second aspect, an embodiment of the present application provides an apparatus for quantizing hybrid accuracy of a network model, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a network model and a configured network model corresponding to the network model, data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and the data processing precision corresponding to the second data type is lower than that corresponding to the first data type;

the first generation module is used for generating calibration data corresponding to the network model;

the first processing module is used for processing the calibration data by utilizing the network model to obtain a first processing result;

the first processing module is further configured to process the calibration data by using the configured network model to obtain a second processing result;

a first determination module to determine a sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the method for quantifying hybrid accuracy of a network model according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is configured to enable a computer to execute a method for quantizing hybrid precision of a network model shown in the first aspect.

In a fifth aspect, an embodiment of the present invention provides a computer program product, including: a computer readable storage medium storing computer instructions which, when executed by one or more processors, cause the one or more processors to perform the steps in the method for quantifying blending accuracy of a network model according to the first aspect.

In a sixth aspect, an embodiment of the present invention provides a method for quantizing mixing precision of a network model, including:

responding to the calling precision quantification request, and determining a processing resource corresponding to a mixed precision quantification service of the network model;

performing the following steps with the processing resource: acquiring a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using the network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; determining a sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result.

In a seventh aspect, an embodiment of the present invention provides a device for quantizing hybrid accuracy of a network model, where the device includes:

the second determination module is used for responding to the calling precision quantification request and determining the processing resource corresponding to the mixed precision quantification service of the network model;

a second processing module, configured to perform the following steps using the processing resource: acquiring a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using the network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; determining a sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result.

In an eighth aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the hybrid precision quantization method of the network model according to the sixth aspect.

In a ninth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, which, when executed by a computer, implements the method for quantizing the mixing precision of a network model according to the sixth aspect.

In a tenth aspect, an embodiment of the present invention provides a computer program product, including: a computer-readable storage medium storing computer instructions which, when executed by one or more processors, cause the one or more processors to perform the steps in the method for quantifying the blending accuracy of a network model according to the sixth aspect.

According to the technical scheme provided by the embodiment of the application, a network model and a configured network model are obtained, wherein data of a network layer in the network model is of a first data type, and data of at least one network layer in the configured network model is of a second data type; then generating calibration data corresponding to the network model, and processing the calibration data by using the network model and the configured network model respectively to obtain a first processing result and a second processing result; and then the sensitivity corresponding to the network model is determined by the first processing result and the second processing result, so that the quantization operation of the mixing precision of the network model can be effectively realized without providing any marking data by a user, the precision of the mixing precision quantization method can be ensured, the data processing performance and efficiency of the network model can be improved, and the practicability of the mixing precision quantization method is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a scene schematic diagram of a method for quantizing a mixing precision of a network model according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for quantizing the hybrid precision of a network model according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of determining a sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result according to an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating a process for determining quantization information corresponding to the network model based on sensitivities corresponding to at least one convolutional layer in the network model according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a method for quantizing the hybrid precision of a network model according to an embodiment of the present application;

fig. 6 is a schematic flowchart of another method for quantifying the hybrid accuracy of a network model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a hybrid precision quantization apparatus for a network model according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device corresponding to the hybrid precision quantization apparatus of the network model shown in fig. 7;

fig. 9 is a schematic structural diagram of another apparatus for quantizing hybrid precision of a network model according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device corresponding to the hybrid precision quantization apparatus of the network model shown in fig. 9.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a" and "an" typically include at least two, but do not exclude the presence of at least one.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

In order to facilitate those skilled in the art to understand the technical solutions provided in the embodiments of the present application, the following description is provided for the related technologies:

with the rapid development of graphics processing GPUs (e.g., chips a100, a10, etc.), the speed and efficiency of low-ratio, specific-point computations compared to floating-point computations can be multiplied, for example: the computation speed of the integer int8 and int4 can be multiplied compared with the floating point number FP32 and the floating point number FP 16. Since the fixed-point calculation operation on the mobile terminal greatly helps the size of the network model and the data processing speed, a common conversion idea for the network model is to convert floating-point numbers into integers for calculation. However, common conversion methods such as quantization perception training, post-quantization, mixed quantization search, and the like often require a user to provide labeled data, which easily causes the user to perceive the whole quantization operation, and the labeled data has a great degree of differentiation, so that the precision loss of the network model is large.

In order to solve the above technical problems, the related art provides a sensitivity evaluation method, in which a picture is generated by using a Batch Normalization (BN) technical characteristic through self-supervision, and then a network model is quantized based on the generated picture. However, due to the quantization insertion position and the asymmetric quantization adopted in the above implementation, this easily results in that the convolutional layer in the network model cannot be converted into the true int8 convolution, and therefore, the speed increase is relatively not particularly obvious, and in turn, the adjustment of the position and quantization strategy is accompanied by a large precision loss. In addition, the above implementation method also needs to count distribution gaps of all network layers, which is mainly used for implementing mixed quantization of lower bits (6 bits, 4 bits). This easily results in a relatively large amount of data computation, and because the network layers do not perform the same function, if the superposition comparison is performed directly, unnecessary errors may be caused.

In order to solve the above technical problem, the present embodiment provides a method, an apparatus, a device, and a storage medium for quantizing the mixing precision of a network model. The mixed precision quantification method is a sensitivity assessment mode which is not perceived by a user, is simpler and effective, the user only needs to provide a model interface, and can carry out the quantification operation of the mixed precision on the network model without any other information, and the method has high quantification degree on the network model and very small precision loss. Specifically, an execution subject of the hybrid precision quantization method of the network model may be a hybrid precision quantization apparatus, and the hybrid precision quantization apparatus may be communicatively connected with a client, as shown in fig. 1:

the client may be any computing device with certain data transmission capability, and the basic structure of the client may include: at least one processor. The number of processors depends on the configuration and type of client. The client may also include a Memory, which may be volatile, such as RAM, or non-volatile, such as Read-Only Memory (ROM), flash Memory, etc., or may include both types. The memory typically stores an Operating System (OS), one or more application programs, and may also store program data and the like. In addition to the processing unit and the memory, the client includes some basic configurations, such as a network card chip, an IO bus, a display component, and some peripheral devices. Alternatively, some peripheral devices may include, for example, a keyboard, a mouse, a stylus, a printer, and the like. Other peripheral devices are well known in the art and will not be described in detail herein. Alternatively, the client may be a pc (personal computer) terminal, a handheld terminal (e.g., a smart phone, a tablet computer), or the like.

The hybrid precision quantization apparatus is a device that can provide a hybrid precision quantization service of a network model in a network virtual environment, and generally refers to an apparatus that performs information planning and hybrid precision quantization operation of the network model by using a network. In terms of physical implementation, the hybrid precision quantization apparatus may be any device capable of providing a computing service, responding to a service request, and performing processing, for example: can be cluster servers, regular servers, cloud hosts, virtual centers, and the like. The mixed precision quantization device mainly comprises a processor, a hard disk, a memory, a system bus and the like, and is similar to a general computer framework.

In the embodiment, the client may be in network connection with the hybrid precision quantifying device, and the network connection may be a wireless or wired network connection. If the ue is communicatively connected to the hybrid precision quantization apparatus, the network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), WiMax, and 5G.

In this embodiment of the present application, a client may generate or obtain a model quantization request, where the model quantization request may include a network model to be analyzed. Specifically, the embodiment does not limit the specific implementation manner of generating or obtaining the model quantization request by the client, for example: the client is provided with an interactive interface, the execution operation input by a user is obtained through the interactive interface, and a model quantization request is generated through the execution operation; or, a specific interface may be set on the client, and the model quantization request may be acquired through the specific interface. After the model quantization request is acquired, the model quantization request may be uploaded to the hybrid precision quantization apparatus, so that the hybrid precision quantization apparatus may perform a quantization processing operation on the network model included in the uploaded model quantization request.

The hybrid precision quantization device is used for receiving a model quantization request uploaded by a client, wherein the model quantization request comprises a network model to be analyzed, and then parameter configuration can be performed on the network model to obtain a configured network model corresponding to the network model, wherein data of a network layer in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and the data processing precision corresponding to the second data type is lower than that corresponding to the first data type.

After the network model is obtained, calibration data corresponding to the network model can be automatically generated, and then the calibration data can be processed by the network model and the configured network model respectively, so that a first processing result corresponding to the network model and a second processing result corresponding to the configured network model can be obtained.

In order to ensure the accuracy of the mixed precision quantization operation on the network model, it may be assumed that the network output layer distribution of the network model satisfies a gaussian distribution, i.e., the first processing result and the second processing result satisfy the gaussian distribution. After the first processing result and the second processing result are obtained, the first processing result and the second processing result may be analyzed, and specifically, a distribution distance between the first processing result and the second processing result may be directly obtained, so as to determine the sensitivity corresponding to at least one network layer in the network model based on the distribution distance. During specific implementation, the mixed precision quantification device can realize mixed quantification evaluation operation on the network models of the first data type and the second data type, the calculation steps of the quantification operation are simple, and the obtained quantification information is more comparative; for example, the evaluation operation for performing mixed quantization on the FP and int8 network models can convert the convolution operation of about 90% of the network models into int8 convolution operation, and can control the data processing accuracy loss of the network models to a certain extent.

According to the technical scheme provided by the embodiment, by acquiring the network model and the configured network model, data of network layers in the network model is of a first data type, and data of at least one network layer in the configured network model is of a second data type; then generating calibration data corresponding to the network model, and processing the calibration data by using the network model and the configured network model respectively to obtain a first processing result and a second processing result; and based on the first processing result and the second processing result, determining the sensitivity corresponding to at least one network layer in the network model, thereby effectively realizing the quantization operation of the mixing precision of the network model without providing any marking data by a user, not only ensuring the precision of the mixing precision quantization method, but also being beneficial to improving the data processing performance and efficiency of the network model and further improving the practicability of the mixing precision quantization method.

The following describes a method, an apparatus, a device, and a storage medium for quantizing the mixing precision of a network model according to various embodiments of the present application with an exemplary application scenario.

Fig. 2 is a schematic flowchart of a method for quantizing the hybrid precision of a network model according to an embodiment of the present disclosure; referring to fig. 2, the present embodiment provides a hybrid precision quantization method for a network model, which can implement quantization operation of a hybrid precision model without user perception. The execution subject of the method may be a mixed precision quantization apparatus of the network model, and it is understood that the mixed precision quantization apparatus may be implemented as software, or a combination of software and hardware. Specifically, the mixed precision quantization method may include:

step S201: the method comprises the steps of obtaining a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type.

Step S202: calibration data corresponding to the network model is generated.

Step S203: and processing the calibration data by using the network model to obtain a first processing result.

Step S204: and processing the calibration data by using the configured network model to obtain a second processing result.

Step S205: based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined.

The above steps are explained in detail below:

For example, when the network layer is a convolutional layer, the number of convolutional layers may be different based on specific implementation functions and purposes of the network model, at this time, the data of the convolutional layer is a first data type, and in some examples, the first data type may be floating point type data (single-precision floating point type data, double-precision floating point type data), that is, the network model at this time may analyze and process the data in a full-precision manner.

In order to enable quantization operation on the network model, a configured network model corresponding to the network model may be obtained, where data of at least one network layer in the configured network model is of a second data type, and in some examples, the second data type may be integer data (short integer data, long integer data). Of course, those skilled in the art may also perform other configuration operations on the first data type and the second data type in other manners as long as it is ensured that the data processing precision corresponding to the second data type is lower than that corresponding to the first data type.

In other examples, obtaining a configured network model corresponding to the network model may include: acquiring configuration parameters corresponding to at least one network layer in a network model; performing integer quantization processing on at least one network layer in the network model based on the configuration parameters to obtain a configured network model, wherein the data of at least one network layer in the configured network model is any one of the following data: INT8 reshaping data, INT4 reshaping data, INT16 reshaping data.

Specifically, after the network model is obtained, the network model may be analyzed to obtain a configuration parameter corresponding to each of at least one network layer in the network model, and then, the at least one network layer in the network model may be subjected to shaping quantization processing based on the configuration parameter, so that the configured network model may be obtained. It is understood that, in different application scenarios or application requirements, the obtained configured network model may be different, and the data of at least one network layer in the configured network model is any one of the following: INT8 reshaping data, INT4 reshaping data, INT16 reshaping data. Of course, those skilled in the art can select other second data types according to specific application scenarios and design requirements, and will not be described herein again.

For example, taking a network layer capable of implementing a model quantization operation as a convolutional layer, a first data type as floating point data, and a second data type as integer data as an example, the network model may include a network layer 1, a network layer 2, a network layer 3, a network layer 4, a network layer 5, and a network layer 6, where the network layer 2, the network layer 3, and the network layer 5 are convolutional layers, and then configuration parameters corresponding to the at least one convolutional layer in the network model may be obtained, and a user may configure the at least one convolutional layer in the network model based on data processing requirements and configuration parameters, for example: the data of the network layer 2, the network layer 3 and the network layer 5 in the network model can be adjusted from the first data type to the second data type in sequence, so that the configured network models respectively corresponding to the network layer 2, the network layer 3 and the network layer 5 can be obtained. Alternatively, the data of the network layer 2 and the network layer 3, or the network layer 2 and the network layer 5, or the network layer 3 and the network layer 5 in the network model may be adjusted from the floating point type data to integer type data, so that the configured network models respectively corresponding to the network layer 2 and the network layer 3, or the network layer 2 and the network layer 5, and the network layer 3 and the network layer 5 may be obtained. Or, the data of the network layer 2, the network layer 3, and the network layer 5 in the network model may be adjusted from the floating point type data to integer type data, so that the configured network model corresponding to the network layer 2, the network layer 3, and the network layer 5 in the network model may be obtained.

Step S202: calibration data corresponding to the network model is generated.

After the network model is obtained, calibration data corresponding to the network model may be generated in order to enable a quantitative operation of the network model with a hybrid accuracy. In some examples, generating calibration data corresponding to the network model may include: obtaining model training parameters corresponding to the network model; based on the model training parameters, calibration data corresponding to the network model is generated.

Specifically, after the network model is obtained, the network model may be analyzed to obtain model training parameters corresponding to the network model, and it is understood that the model training parameters are related to a network training process of the network model. After the model training parameters are obtained, the model training parameters can be analyzed to generate calibration data corresponding to the network model, and the obtained calibration data can be data closer to the training data of the network model; wherein the calibration data may be at least one of: image data, text data, audio data, and the like.

After the network model and the calibration data are obtained, the calibration data can be analyzed and processed by using the network model, that is, the calibration data is input into the network model, so that a first processing result output by the network model can be obtained. In some examples, processing the calibration data using the network model, and obtaining the first processing result may include: processing the calibration data by using a network model to obtain a first characteristic mean value and a first characteristic variance value of each characteristic channel output by the network; and determining a first processing result based on the first feature mean value and the first feature variance value.

Specifically, the calibration data may be input to the network model, so that the network model may analyze and process the calibration data, thereby obtaining a first feature mean value and a first feature variance value of each feature channel output by the network, and then may analyze and process the first feature mean value and the first feature variance value to determine a first processing result. In some examples, the first processing result may be distribution information composed of a first feature mean value and a first feature variance value.

After the configured network model and the calibration data are obtained, the configured network model may be used to analyze and process the calibration data, that is, the calibration data is input into the configured network model, so that a second processing result output by the configured network model may be obtained. In some examples, processing the calibration data using the configured network model to obtain the second processing result may include: processing the calibration data by using the configured network model to obtain a second feature mean value and a second feature variance value of each feature output by the network; and determining a second processing result based on the second feature mean value and the second feature variance value.

Specifically, the calibration data may be input to the configured network model, so that the configured network model may analyze and process the calibration data, thereby obtaining a second feature mean value and a second feature variance value of each feature channel output by the network, and then may analyze and process the second feature mean value and the second feature variance value to determine a second processing result. In some examples, the second processing result may be distribution information composed of a second feature mean value and a second feature variance value.

After the first and second processing results are obtained, the first and second processing results may be analyzed to determine a sensitivity corresponding to at least one network layer in the network model. In some examples, a quantization rule for performing a quantization operation on the network model is pre-configured, and after the first processing result and the second processing result are obtained, the first processing result and the second processing result may be analyzed and processed based on the quantization rule, so that a sensitivity corresponding to at least one network layer in the network model may be obtained.

Of course, the skilled person may also adopt other ways to "determine the sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result", as long as the accurate reliability of determining the sensitivity corresponding to at least one network layer in the network model can be ensured.

After determining the sensitivity corresponding to at least one network layer in the network model, the network model may be subjected to an adjustment optimization operation based on the sensitivity, which may include, for example: the sensitivity 1 corresponding to the configured network model 1 (and at least one network layer in the network model), the sensitivity 2 corresponding to the configured network model 2, and the sensitivity 3 corresponding to the configured network model 3, wherein the sensitivity 1< the sensitivity 3< the sensitivity 2, and the configured network model 2 corresponding to the sensitivity 2 can have higher data processing accuracy under the requirements of certain data processing accuracy and lower data processing capacity; the configured network model 1 corresponding to the sensitivity 1 has lower data processing accuracy under the requirements of certain data processing precision and lower data processing capacity; at this time, the network model may be adjusted to the configured network model 2 based on the above sensitivity, which is beneficial to improving the data processing performance and quality of the network model.

In the method for quantizing the hybrid accuracy of the network model provided by this embodiment, a network model and a configured network model are obtained, where data of a network layer in the network model is of a first data type, and data of at least one network layer in the configured network model is of a second data type; then generating calibration data corresponding to the network model, and processing the calibration data by using the network model and the configured network model respectively to obtain a first processing result and a second processing result; and then the sensitivity corresponding to the network model is determined by the first processing result and the second processing result, so that the quantization operation of the mixing precision of the network model can be effectively realized without providing any marking data by a user, the precision of the mixing precision quantization method can be ensured, the data processing performance and efficiency of the network model can be improved, and the practicability of the mixing precision quantization method is further improved.

FIG. 3 is a schematic flow chart illustrating a process for determining sensitivity corresponding to at least one network layer in a network model based on a first processing result and a second processing result according to an embodiment of the present disclosure; referring to fig. 3, this embodiment provides an implementation manner of determining a sensitivity corresponding to at least one network layer in a network model, and specifically, the determining a sensitivity corresponding to at least one network layer in a network model based on a first processing result and a second processing result in this embodiment may include:

step S301: acquiring distance information between a first processing result and a second processing result;

step S302: based on the distance information, a sensitivity corresponding to at least one network layer in the network model is determined.

Specifically, since the first processing result and the second processing result obtained by the network model and the configured network model satisfy gaussian distribution or approximate gaussian distribution, after the first processing result and the second processing result are obtained, distance information between the first processing result and the second processing result may be obtained, and the distance information may be any one of the following: euclidean distance, manhattan distance, hamming distance, Wasserstein distance, and the like. After the distance information is acquired, analyzing and processing the distance information to determine the sensitivity corresponding to at least one network layer in the network model, wherein the sensitivity corresponding to at least one network layer in the network model is positively correlated with the distance information, that is, the larger the distance information is, the larger the sensitive information corresponding to at least one network layer in the network model is, that is, the more sensitive the data processing operation of at least one network layer in the network model is; the smaller the distance information is, the smaller the sensitive information corresponding to at least one network layer in the network model is, i.e. it is stated that the data processing operation of at least one network layer in the network model is less sensitive.

In this embodiment, the sensitivity corresponding to at least one network layer in the network model is determined based on the distance information after the distance information between the first processing result and the second processing result is obtained, so that the accuracy and reliability of determining the sensitivity corresponding to at least one network layer in the network model are effectively realized.

In some examples, after determining the sensitivity corresponding to at least one network layer in the network model, the method further comprises: determining quantitative information corresponding to the network model based on the sensitivity corresponding to at least one network layer in the network model, the quantitative information comprising: recommendation information for performing a quantitative operation on the network model and a sensitivity corresponding to the recommendation information.

After obtaining the sensitivity corresponding to at least one network layer in the network model, analyzing the sensitivity corresponding to at least one network layer in the network model to determine quantitative information corresponding to the network model, where the quantitative information may include: the recommendation information used for carrying out the quantization operation on the network model and the sensitivity corresponding to the recommendation information effectively ensure the accuracy and reliability of determining the quantization information corresponding to the network model.

In some examples, referring to fig. 4, this embodiment provides an implementation manner of determining quantitative information corresponding to a network model, and specifically, the method in this embodiment may include:

step S401: and sequencing the obtained sensitivities corresponding to each network layer in the network model to obtain the sequenced sensitivities.

After the configured network models respectively corresponding to at least one network layer in the network models are obtained, the sensitivities corresponding to the configured network models can be obtained, and then all the obtained sensitivities corresponding to the network models can be sequenced, so that the sequenced sensitivities can be obtained. It is understood that the sorted sensitivities may be obtained by sorting all sensitivities from high to low, or may be obtained by sorting all sensitivities from low to high.

Step S402: the sorted sensitivities are divided into a plurality of sensitivity sets.

After the ranked sensitivities are obtained, the ranked sensitivities may be divided into a plurality of sensitivity sets. In some examples, dividing the ranked sensitivities into a plurality of sensitivity sets may include: obtaining a variation amplitude of a first sensitivity and a variation amplitude of a second sensitivity based on the sorted sensitivities, wherein the first sensitivity is adjacent to the second sensitivity; and dividing the sorted sensitivities into a plurality of sensitivity sets based on the variation amplitude of the first sensitivity and the variation amplitude of the second sensitivity.

For example, the ranked sensitivities may include: assuming that the first sensitivity is the sensitivity c and the second sensitivity is the sensitivity b, for example, the variation amplitude of the sensitivity c can be obtained by the sorted sensitivities, the variation amplitude can be the sensitivity c-sensitivity b, and similarly, the variation amplitude of the sensitivity b can be obtained by the sorted sensitivities, and is the sensitivity b-sensitivity a.

After obtaining the variation amplitude of the first sensitivity (sensitivity c-sensitivity b) and the variation amplitude of the second sensitivity (sensitivity b-sensitivity a), the variation amplitudes of the first sensitivity and the second sensitivity may be analyzed to divide the sorted sensitivities into a plurality of sensitivity sets.

In some examples, dividing the ranked sensitivities into a plurality of sensitivity sets based on the magnitude of the change in the first sensitivity and the magnitude of the change in the second sensitivity may include: acquiring a sensitivity change ratio between the change amplitude of the first sensitivity and the change amplitude of the second sensitivity; and when the sensitivity change ratio is larger than or equal to a preset threshold value, dividing the sorted sensitivities based on the second sensitivity to obtain a plurality of sensitivity sets.

For example, when the variation amplitude of the first sensitivity is sensitivity c-sensitivity b and the variation amplitude of the second sensitivity is sensitivity b-sensitivity a, the sensitivity variation ratio between the variation amplitude of the first sensitivity and the variation amplitude of the second sensitivity can be obtained, i.e. the sensitivity variation ratio is

。

After the sensitivity change ratio is obtained, the sensitivity change ratio may be analyzed and compared with a preset threshold, and when the sensitivity change ratio is greater than or equal to the preset threshold, it is indicated that the second sensitivity is an inflection point sensitivity in the sorted sensitivities. For example, in the sensitivity variation ratio

>T, then, the sorted sensitivities may be divided based on the sensitivity b, so as to obtain a plurality of sensitivity sets, where the plurality of sensitivity sets may include: the first sensitivity set (comprising the sensitivity a and the sensitivity b) and the second sensitivity set (comprising the sensitivity c, the sensitivity d, the sensitivity e and the sensitivity f) effectively realize that the sensitivities with similar values in the sorted sensitivities can be divided into one sensitivity set, namely, the sorted sensitivities are divided into a plurality of sensitivity blocks.

Step S403: based on the plurality of sensitivity sets, quantitative information corresponding to the network model is determined.

After the multiple sensitivity sets are obtained, the multiple sensitivity sets may be analyzed to determine quantitative information corresponding to the network model. In some examples, after determining the quantitative information corresponding to the network model, the method in this embodiment may further include: acquiring target processing precision corresponding to the network model; determining at least one target sensitivity set among the plurality of sensitivity sets based on the target processing precision and the quantization information; and quantifying the network layer corresponding to the at least one target sensitivity set in the network model to obtain the target network model.

After the quantization information corresponding to the network model is determined, the user may perform an adjustment optimization operation on the network model based on the quantization information corresponding to the network model, specifically, a target processing precision corresponding to the network model may be obtained first, where the target processing precision may be directly input to the hybrid precision quantization device by the user, or the hybrid precision quantization device may be provided with a data processing interface, and the target processing precision transmitted by the third device may be obtained through the data processing interface.

After the target processing precision and the quantitative information are obtained, the target processing precision and the quantitative information can be analyzed and processed to determine at least one target sensitivity set in the sensitivity sets, and then the network layer corresponding to the network model can be quantized based on the at least one target sensitivity set, so that the target network model can be obtained, the data processing precision of the target network model is the target processing precision, and therefore the network model can be effectively adjusted and optimized.

In the embodiment, the obtained sensitivities corresponding to the network layers in the network model are sequenced to obtain the sequenced sensitivities, then the sequenced sensitivities are divided into a plurality of sensitivity sets, and the quantization information corresponding to the network model is determined based on the sensitivity sets, so that the network model is effectively adjusted and optimized based on the obtained quantization information, a target network model meeting design requirements and user requirements is obtained, and the practicability of the hybrid precision quantization method is further improved.

In specific application, referring to fig. 5, an execution subject of the quantization method may be a hybrid precision quantization apparatus, and when the hybrid precision quantization apparatus executes the quantization method of the hybrid precision model, a user only needs to provide an interface corresponding to a network model, and does not need to provide any other information, so that the FP and int8 hybrid precision quantization operation of the network model can be realized, and the quantization degree is high and the precision loss is very small. Specifically, the method for quantizing the mixed precision in this embodiment may include the following steps:

step 1: and acquiring an original model to be quantized, namely importing the original network model which needs to be quantized into the mixed precision quantization device.

Step 2: and generating a calibration image corresponding to the network model.

After the network model is acquired, a calibration image corresponding to the network model can be automatically generated based on the network model without any picture provided by the user. Specifically, generating the calibration image corresponding to the network model may include: the method comprises the steps of obtaining training parameters corresponding to a network model, and automatically generating a calibration image corresponding to the network model based on the training parameters, wherein the calibration image can be closer to training data used for training the network model.

And step 3: and acquiring configuration parameters corresponding to at least one convolution layer in the network model, and performing integer quantization processing on the at least one convolution layer in the network model based on the configuration parameters to obtain a configured network model, wherein the data of the at least one convolution layer in the configured network model is INT8 integer data.

Specifically, configuration parameters corresponding to at least one convolution layer in the network model may be acquired, and then int8 quantization may be performed on each convolution layer one by one, so that integer quantization processing may be performed on at least one convolution layer in the network model, and the configured network model may be acquired.

And 4, step 4: and (5) carrying out quantitative sensitivity analysis on each layer.

Inputting the calibration data into the network model, so as to obtain a first feature mean and a first feature variance of each feature channel output by the network model (

). Similarly, the calibration data may be input into the configured network model, so as to obtain a second feature mean and a second feature variance of each feature channel output by the configured network model: (

）。

And 5: based on the first feature mean and the first feature variance (

) A second feature mean and a second feature variance: (

) A sensitivity corresponding to at least one convolutional layer in the network model is obtained.

Wherein, the first feature mean and the first feature variance are obtained (

) Corresponding first distribution letterAnd (c) a second feature mean and a second feature variance (c)

) Calculating distance information between the first distribution information and the second distribution information according to the corresponding second distribution information, wherein the distance information in the implementation mode can be Wasserstein distance (bulldozer distance); after obtaining the distance information, the sensitivity may be determined based on the distance information, it being understood that the greater the distance information, the more sensitive the accuracy of the corresponding convolutional layer in the network model is.

Step 6: each layer quantization strategy is set based on the sensitivity corresponding to at least one convolutional layer in the network model.

After at least one network layer in the network model is subjected to quantization operation in sequence, the sensitivity corresponding to each network layer subjected to quantization operation can be obtained, all the obtained sensitivities can be ranked, so that the ranked sensitivity can be obtained, and then the quantization strategy of each network layer can be determined according to the ranked sensitivity and the target quantization degree.

In particular, assume that

Is the sensitivity at the nth position in the ranked sensitivities,

is the sensitivity at the n +1 th position in the ranked sensitivity,

the sensitivity at the n-1 th position in the sorted sensitivity is obtained, and then the sensitivity change ratio between the change amplitude of the sensitivity at the n +1 th position and the change amplitude of the sensitivity at the n-th position can be calculated, namely

。

After the sensitivity change ratio is obtained, the sensitivity change ratio can be analyzed and compared with a preset threshold, when the sensitivity change ratio is larger than or equal to the preset threshold, a mark can be added to the nth sensitivity, the sorted sensitivities are divided into a plurality of sensitivity sets based on the sensitivities to which the marks are added, and then quantization strategies of each layer can be set based on the sensitivity sets.

And 7: the method comprises the steps of obtaining target processing precision corresponding to a network model, then determining at least one target sensitivity set in a plurality of sensitivity sets based on the target processing precision and quantization information, quantizing a network layer corresponding to the at least one target sensitivity set in the network model, and obtaining the target network model, namely realizing quantization calibration operation of the network model based on the quantization information.

And 8: and (4) deriving a quantization model.

After the target network model is obtained, the target network model obtained after the quantization strategy may be derived.

The target network model meeting the target processing precision input by the user can be obtained by the mixed precision quantification method, and the data processing precision of the target network model can be effectively ensured. Specifically, taking a residual error network model and a lightweight network model as network models as an example, the sensitivity information corresponding to each convolutional layer of the network model after quantization operation can be obtained through the mixed precision quantization direction:

the sensitivity of each layer of the residual network model Resnet18 is: 0.0059, 0.0067, 0.0031, 0.0010, 0.0008, 0.0009, 0.0007, 0.0008, 0.0007, 0.0010, 0.0004, 0.0010, 0.0009, 0.0010, 0.0008, 0.0011, 0.0116.

The sensitivity of each layer of the residual network model Resnet50 is: 0.0101, 0.0086, 0.0042, 0.0058, 0.0121, 0.0055, 0.0030, 0.0029, 0.0039, 0.0017, 0.0022, 0.0051, 0.0033, 0.0024, 0.0071, 0.0014, 0.0016, 0.0015, 0.0017, 0.0043, 0.0033, 0.0022, 0.0054, 0.0043, 0.0048, 0.0066, 0.0011, 0.0031, 0.0016, 0.0014, 0.0023, 0.0051, 0.0015, 0.0022, 0.0021, 0.0018, 0.0027, 0.0022, 0.0025, 0.0035, 0.0055, 0.0027, 0.0023, 0.0024, 0.0021, 0.0034, 0.0024, 0.0021, 0.0025, 0.0024, 0.0021, and 0.0024.

The sensitivity of each layer of the lightweight network model Mobilenetv2 is: 0.1471, 0.1176, 0.0236, 0.0209, 0.0341, 0.0331, 0.0135, 0.0353, 0.0222, 0.0175, 0.0257, 0.0319, 0.0060, 0.0219, 0.0151, 0.0060, 0.0105, 0.0157, 0.0119, 0.0146, 0.0279, 0.0048, 0.0141, 0.0107, 0.0041, 0.0087, 0.0124, 0.0045, 0.0070, 0.0165, 0.0104, 0.0206, 0.0257, 0.0055, 0.0094, 0.0209, 0.0051, 0.0115, 0.0231, 0.0108, 0.0080, 0.1462, 0.0148, 0.0236, 0.0449, 0.0167, 0.0267, 0.02678, 0.03090, 0.0309, 0.0060.044, 0.0029, 0.0023, 0.8, 0.0449.

The network model can be adjusted and optimized through the obtained sensitivity information, and a target network model can be obtained, that is, a network model with higher precision can be obtained compared with a quantization mode in the prior art, specifically, the data processing precision of the target network model is shown in the following table:

therefore, the technical scheme provided by the application embodiment not only realizes the quantization operation of the mixing precision of the network model without providing any marking data by a user, but also can ensure the precision of the mixing precision quantization method, is favorable for improving the data processing performance and efficiency of the network model, and further improves the practicability of the mixing precision quantization method.

Fig. 6 is a schematic flowchart of another method for quantifying the hybrid accuracy of a network model according to an embodiment of the present application; referring to fig. 6, the embodiment provides a method for quantizing the mixed precision of a network model, and the execution subject of the method may be a mixed precision quantizing device of the network model, and it is understood that the mixed precision quantizing device may be implemented as software, or a combination of software and hardware. Specifically, the mixed precision quantization method may include:

step S601: responding to the calling precision quantification request, and determining a processing resource corresponding to a mixed precision quantification service of the network model;

step S602: performing the following steps with a processing resource: the method comprises the steps of obtaining a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using a network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined.

Specifically, the method for quantizing the mixing precision of the network model provided by the invention can be executed at a cloud end, a plurality of computing nodes can be deployed at the cloud end, and each computing node has processing resources such as computation, storage and the like. In the cloud, a plurality of computing nodes may be organized to provide a service, and of course, one computing node may also provide one or more services.

Aiming at the scheme provided by the invention, the cloud end can provide a service for completing a mixed precision quantification method of the network model, namely a mixed precision quantification service of the network model. When a user needs to use the mixed precision quantification service of the network model, the mixed precision quantification service of the network model is called to trigger a request for calling the mixed precision quantification service of the network model to a cloud, and the request can carry the network model to be quantified. The cloud determines the compute nodes that respond to the request, and performs the following steps using processing resources in the compute nodes: the method comprises the steps of obtaining a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using a network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined.

Specifically, the implementation process, implementation principle and implementation effect of the above method steps in this embodiment are similar to the implementation process, implementation principle and implementation effect of the method steps in the embodiment shown in fig. 1 to 5, and for parts not described in detail in this embodiment, reference may be made to the related description of the embodiment shown in fig. 1 to 5.

Fig. 7 is a schematic structural diagram of a hybrid precision quantization apparatus for a network model according to an embodiment of the present disclosure; referring to fig. 7, the present embodiment provides a hybrid precision quantization apparatus of a network model, the hybrid precision quantization apparatus is configured to perform the hybrid precision quantization method of the network model shown in fig. 2, and specifically, the hybrid precision quantization apparatus may include:

the first obtaining module 11 is configured to obtain a network model and a configured network model corresponding to the network model, where data of a network layer in the network model is a first data type, data of at least one network layer in the configured network model is a second data type, and data processing accuracy corresponding to the second data type is lower than data processing accuracy corresponding to the first data type;

a first generating module 12, configured to generate calibration data corresponding to the network model;

the first processing module 13 is configured to process the calibration data by using a network model to obtain a first processing result;

the first processing module 13 is further configured to process the calibration data by using the configured network model to obtain a second processing result;

a first determining module 14, configured to determine a sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result.

In some examples, when the first obtaining module 11 obtains the configured network model corresponding to the network model, the first obtaining module 11 is configured to perform: acquiring configuration parameters corresponding to at least one network layer in a network model; performing integer quantization processing on at least one network layer in the network model based on the configuration parameters to obtain a configured network model, wherein the data of at least one network layer in the configured network model is any one of the following data: INT8 reshaping data, INT4 reshaping data, INT16 reshaping data.

In some examples, when the first generation module 12 generates calibration data corresponding to a network model, the first generation module 12 is configured to perform: obtaining model training parameters corresponding to the network model; based on the model training parameters, calibration data corresponding to the network model is generated.

In some examples, when the first processing module 13 processes the calibration data by using the network model to obtain a first processing result, the first processing module 13 is configured to perform: processing the calibration data by using a network model to obtain a first characteristic mean value and a first characteristic variance value of each characteristic channel output by the network; and determining a first processing result based on the first feature mean value and the first feature variance value.

In some examples, when the first processing module 13 processes the calibration data by using the configured network model to obtain the second processing result, the first processing module 13 is configured to perform: processing the calibration data by using the configured network model to obtain a second feature mean value and a second feature variance value of each feature output by the network; and determining a second processing result based on the second feature mean value and the second feature variance value.

In some examples, when the first determining module 14 determines the sensitivity corresponding to at least one network layer in the network model based on the first processing result and the second processing result, the first determining module 14 is configured to perform: acquiring distance information between a first processing result and a second processing result; based on the distance information, a sensitivity corresponding to at least one network layer in the network model is determined.

In some examples, the sensitivity corresponding to at least one network layer in the network model is positively correlated with the distance information.

In some examples, after determining the sensitivity corresponding to at least one network layer in the network model, the first determination module 14 is to: determining quantitative information corresponding to the network model based on the sensitivity corresponding to at least one network layer in the network model, the quantitative information comprising: recommendation information for performing a quantitative operation on the network model and a sensitivity corresponding to the recommendation information.

In some examples, when the first determining module 14 determines the quantitative information corresponding to the network model based on the sensitivity corresponding to at least one network layer in the network model, the first determining module 14 is configured to perform: sequencing the obtained sensitivities corresponding to each network layer in the network model to obtain the sequenced sensitivities; dividing the sorted sensitivities into a plurality of sensitivity sets; based on the plurality of sensitivity sets, quantitative information corresponding to the network model is determined.

In some examples, after determining the quantitative information corresponding to the network model, the first obtaining module 11 and the first processing module 13 in this embodiment are configured to perform the following steps:

a first obtaining module 11, configured to obtain a target processing precision corresponding to the network model;

a first processing module 13 for determining at least one target sensitivity set among the plurality of sensitivity sets based on the target processing precision and the quantization information;

the first processing module 13 is further configured to quantize a network layer corresponding to at least one target sensitivity set in the network model, so as to obtain a target network model.

In some examples, when the first determining module 14 divides the ranked sensitivities into a plurality of sensitivity sets, the first determining module 14 is configured to perform: obtaining a variation amplitude of a first sensitivity and a variation amplitude of a second sensitivity based on the sorted sensitivities, wherein the first sensitivity is adjacent to the second sensitivity; and dividing the sorted sensitivities into a plurality of sensitivity sets based on the variation amplitude of the first sensitivity and the variation amplitude of the second sensitivity.

In some examples, when the first determination module 14 divides the ranked sensitivities into a plurality of sensitivity sets based on the magnitude of the change in the first sensitivity and the magnitude of the change in the second sensitivity, the first determination module 14 is configured to perform: acquiring a sensitivity change ratio between the change amplitude of the first sensitivity and the change amplitude of the second sensitivity; and when the sensitivity change ratio is larger than or equal to a preset threshold value, dividing the sorted sensitivities based on the second sensitivity to obtain a plurality of sensitivity sets.

The apparatus shown in fig. 7 can perform the method of the embodiment shown in fig. 1-5, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-5. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 5, and are not described herein again.

In one possible design, the structure of the hybrid precision quantifying device of the network model shown in fig. 7 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 8, the electronic device may include: a first processor 21 and a first memory 22. Wherein the first memory 22 is used for storing a program of a corresponding electronic device to execute the hybrid precision quantization method of the network model provided in the embodiments shown in fig. 1-5, and the first processor 21 is configured to execute the program stored in the first memory 22.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the first processor 21, are capable of performing the steps of: the method comprises the steps of obtaining a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using a network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined.

Further, the first processor 21 is also used to execute all or part of the steps in the embodiments shown in fig. 1-5.

The electronic device may further include a first communication interface 23 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the method for quantizing the blending precision of a network model in the method embodiments shown in fig. 1 to 5.

Furthermore, an embodiment of the present invention provides a computer program product, including: a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the method for quantifying the blending accuracy of a network model in the method embodiments of fig. 1-5 described above.

Fig. 9 is a schematic structural diagram of another apparatus for quantizing hybrid precision of a network model according to an embodiment of the present application; referring to fig. 9, the present embodiment provides another mixed precision quantization apparatus for a network model, where the mixed precision quantization apparatus is configured to perform the mixed precision quantization method for the network model shown in fig. 6, and specifically, the mixed precision quantization apparatus may include:

a second determining module 31, configured to determine, in response to the request for invoking the precision quantization, a processing resource corresponding to a hybrid precision quantization service of the network model;

a second processing module 32, configured to perform the following steps with the processing resource: the method comprises the steps of obtaining a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using a network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined.

The apparatus shown in fig. 9 can perform the method of the embodiment shown in fig. 5-6, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 5-6. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 5 to 6, and are not described herein again.

In one possible design, the structure of the hybrid precision quantifying device of the network model shown in fig. 9 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 10, the electronic device may include: a second processor 41 and a second memory 42. Wherein the second memory 42 is used for storing a program of a corresponding electronic device for executing the hybrid precision quantification method of the network model provided in the embodiments shown in fig. 5-6, and the second processor 41 is configured for executing the program stored in the second memory 42.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the second processor 41, are capable of performing the steps of: responding to the calling precision quantification request, and determining a processing resource corresponding to a mixed precision quantification service of the network model; performing the following steps with a processing resource: the method comprises the steps of obtaining a network model and a configured network model corresponding to the network model, wherein data of network layers in the network model is of a first data type, data of at least one network layer in the configured network model is of a second data type, and data processing precision corresponding to the second data type is lower than that corresponding to the first data type; generating calibration data corresponding to the network model; processing the calibration data by using a network model to obtain a first processing result; processing the calibration data by using the configured network model to obtain a second processing result; based on the first processing result and the second processing result, a sensitivity corresponding to at least one network layer in the network model is determined.

Further, the second processor 41 is also used to execute all or part of the steps in the embodiments shown in fig. 5-6.

The electronic device may further include a second communication interface 43 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the method for quantizing the blending precision of a network model in the method embodiments shown in fig. 5 to 6.

Furthermore, an embodiment of the present invention provides a computer program product, including: a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the method for quantifying the blending accuracy of a network model in the method embodiments of fig. 5-6 described above.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described technical solutions and/or portions thereof that contribute to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein (including but not limited to disk storage, CD-ROM, optical storage, etc.).

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for quantizing the mixed precision of network models includes:

generating calibration data corresponding to the network model;

2. The method of claim 1, obtaining a configured network model corresponding to the network model, comprising:

acquiring configuration parameters corresponding to at least one network layer in the network model;

performing integer quantization processing on at least one network layer in the network model based on the configuration parameters to obtain a configured network model, wherein data of at least one network layer in the configured network model is any one of the following data: INT8 reshaping data, INT4 reshaping data, INT16 reshaping data.

3. The method of claim 1, generating calibration data corresponding to the network model, comprising:

obtaining model training parameters corresponding to the network model;

and generating calibration data corresponding to the network model based on the model training parameters.

4. The method of claim 1, processing the calibration data using the network model to obtain a first processing result, comprising:

processing the calibration data by using the network model to obtain a first characteristic mean value and a first characteristic variance value of each characteristic channel output by the network;

and determining the first processing result based on the first feature mean value and the first feature variance value.

5. The method of claim 1, processing the calibration data using the configured network model to obtain a second processing result, comprising:

processing the calibration data by using the configured network model to obtain a second feature mean value and a second feature variance value of each feature output by the network;

and determining the second processing result based on the second feature mean value and the second feature variance value.

6. The method of claim 1, determining a sensitivity corresponding to at least one network layer in the network model based on the first and second processing results, comprising:

acquiring distance information between the first processing result and the second processing result;

based on the distance information, a sensitivity corresponding to at least one network layer in the network model is determined.

7. The method of claim 6, wherein a sensitivity corresponding to at least one network layer in the network model is positively correlated to the distance information.

8. The method of claim 1, after determining a sensitivity corresponding to at least one network layer in the network model, the method further comprising:

determining quantitative information corresponding to the network model based on a sensitivity corresponding to at least one network layer in the network model, the quantitative information comprising: recommendation information for performing a quantitative operation on a network model and a sensitivity corresponding to the recommendation information.

9. The method of claim 8, determining quantitative information corresponding to the network model based on a sensitivity corresponding to at least one network layer in the network model, comprising:

sequencing the obtained sensitivities corresponding to each network layer in the network model to obtain the sequenced sensitivities;

dividing the sorted sensitivities into a plurality of sensitivity sets;

based on the plurality of sensitivity sets, quantitative information corresponding to the network model is determined.

10. The method of claim 9, after determining quantitative information corresponding to the network model, the method further comprising:

acquiring target processing precision corresponding to the network model;

determining at least one target sensitivity set among the plurality of sensitivity sets based on the target processing precision and quantization information;

and quantifying the network layer corresponding to the at least one target sensitivity set in the network model to obtain a target network model.

11. The method of claim 9, dividing the ranked sensitivities into a plurality of sensitivity sets, comprising:

obtaining a variation amplitude of a first sensitivity and a variation amplitude of a second sensitivity based on the sorted sensitivities, wherein the first sensitivity is adjacent to the second sensitivity;

and dividing the sorted sensitivities into a plurality of sensitivity sets based on the variation amplitude of the first sensitivity and the variation amplitude of the second sensitivity.

12. The method of claim 11, dividing the ranked sensitivities into a plurality of sensitivity sets based on a magnitude of change in the first sensitivity and a magnitude of change in the second sensitivity, comprising:

acquiring a sensitivity change ratio between the change amplitude of the first sensitivity and the change amplitude of the second sensitivity;

and when the sensitivity change ratio is larger than or equal to a preset threshold value, dividing the sorted sensitivities based on the second sensitivity to obtain a plurality of sensitivity sets.

13. A method for quantizing the mixed precision of network models includes:

14. An electronic device, comprising: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement a method of hybrid-accuracy quantification of a network model according to any one of claims 1-12.

15. A computer storage medium storing a computer program which causes a computer to implement a method of hybrid accuracy quantification of a network model according to any one of claims 1 to 12 when executed.

16. A computer program product, comprising: a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the method for hybrid accuracy quantification of a network model of any one of claims 1-12.