CN115759238A - Method and device for generating quantization model, electronic equipment and storage medium - Google Patents

Method and device for generating quantization model, electronic equipment and storage medium Download PDF

Info

Publication number
CN115759238A
CN115759238A CN202310005904.2A CN202310005904A CN115759238A CN 115759238 A CN115759238 A CN 115759238A CN 202310005904 A CN202310005904 A CN 202310005904A CN 115759238 A CN115759238 A CN 115759238A
Authority
CN
China
Prior art keywords
network layer
quantization
target
network
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310005904.2A
Other languages
Chinese (zh)
Other versions
CN115759238B (en
Inventor
刘艳
林金辉
王恒
石宇航
孙梦磊
杨思琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202310005904.2A priority Critical patent/CN115759238B/en
Publication of CN115759238A publication Critical patent/CN115759238A/en
Application granted granted Critical
Publication of CN115759238B publication Critical patent/CN115759238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The disclosure relates to a method and a device for generating a quantization model, an electronic device and a storage medium, wherein the method comprises the following steps: quantizing a pre-trained target detection model; determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization; grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group; and carrying out parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model, carrying out parallel quantization on a plurality of network layers with the contribution degree smaller than the preset threshold value and a single network layer with the contribution degree greater than or equal to the preset threshold value of each group, ensuring the precision of the quantization process by carrying out independent quantization on the network layers with the larger contribution degree, and improving the speed of the quantization process in a parallel quantization mode.

Description

Method and device for generating quantization model, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of neural network technologies, and in particular, to a method and an apparatus for generating a quantization model, an electronic device, and a storage medium.
Background
In the parameter quantization method adopted for image classification model compression, the following problems exist: image classification model quantization has less data to effectively represent under a low bit constraint, but the precision of a quantized model is greatly lost, wherein the low bit can be 4-bit.
In the related technology, a quantization mode of block reconstruction is adopted, and parameters in corresponding volume blocks are updated by using an output result of each block in a model, so that a mode of combining local information and global information is achieved; a method of quantization combined with knowledge distillation can also be used to guide the quantization process with the output of each convolution layer and the output of the activation function.
However, in the quantization process, only a layer-by-layer or block-by-block quantization mode is adopted, and the influence of local information and global information on quantization is not well balanced, so that the precision loss is serious.
Therefore, in the deep neural network model quantization process, reducing the lost precision under the low bit constraint is a problem to be solved urgently.
Disclosure of Invention
In order to solve the technical problem described above or at least partially solve the technical problem described above, embodiments of the present disclosure provide a method and an apparatus for generating a quantization model, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for generating a quantization model, where the method includes:
quantizing a pre-trained target detection model;
determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;
grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group;
and carrying out parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.
In a possible implementation manner, the contribution degree of each network layer to the target detection model is determined according to the difference value of the output values of each network layer of the target detection model before quantization and after quantization by the following expression:
Figure 387818DEST_PATH_IMAGE001
wherein S is the contribution degree of the current network layer to the target detection model,
Figure 724253DEST_PATH_IMAGE002
for the ith output value before quantization of the current network layer,
Figure DEST_PATH_IMAGE003
and the ith output value quantized for the current network layer.
In one possible embodiment, the quantifying the pre-trained object detection model includes:
and for each weight factor of each network layer of the pre-trained target detection model, adjusting the current bit to be a target bit, wherein the target bit is smaller than the current bit.
In a possible implementation manner, the grouping all the network layers whose contribution degrees are smaller than a preset threshold, and determining the network layer of each group includes:
sequencing all network layers with contribution degrees smaller than a preset threshold value according to the sequence of the network layers in the target detection model to obtain sequenced network layers;
and grouping the sequenced network layers, and determining the network layers of each group, wherein the number of the network layers of each group is a preset value.
In a possible embodiment, the performing parallel quantization on each group of network layers and each network layer whose contribution degree is greater than or equal to a preset threshold includes:
distributing a parallel thread to each network layer with the contribution degree greater than or equal to a preset threshold value respectively, and quantifying by taking each network layer with the contribution degree greater than or equal to the preset threshold value as a first target network layer;
and respectively allocating a parallel thread to each group of network layers, and quantifying each network layer in each group of network layers as a second target network layer.
In one possible embodiment, the quantizing each network layer of each group as the second target network layer includes:
taking each network layer in each group of network layers as a second target network layer;
and quantizing the second target network layer in sequence according to the sequence of each group of network layers in the target detection model.
In one possible implementation, the quantizing the second target network layer includes:
adjusting the most values of the value ranges of all known parameters to be quantized of the second target network layer, and quantizing the second target network layer according to a plurality of groups of adjusted most values;
under the condition that the second target network layer is the Mth network layer of the target detection model, for each group of adjusted final values for quantizing the Mth network layer, sequentially calculating output values of each network layer from the Mth network layer to a third target network layer in the Mth network layer before and after quantization according to the sequence of the network layers in the target detection model, wherein the third target network layer is the first network layer behind the Mth network layer in the sequenced network layers;
respectively calculating the loss value of each network layer from the Mth network layer to the third target network layer according to the output value of each network layer from the Mth network layer to the third target network layer before and after quantization of each network layer;
sequencing the loss values of each network layer from the Mth network layer to the third target network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;
taking the inflection point of the loss value as the termination point of the quantization process of the M network layer by the adjusted latest value of the current group;
and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted maximum values of each group, and taking the adjusted maximum value with the minimum loss value as a target maximum value.
In a possible implementation, the quantizing each network layer whose contribution degree is greater than or equal to a preset threshold as a first target network layer includes:
taking each network layer with the contribution degree greater than or equal to a preset threshold value as a first target network layer;
adjusting the most values of all known parameter value ranges to be quantized of the first target network layer, and quantizing the first target network layer according to a plurality of groups of adjusted most values;
under the condition that the first target network layer is the Nth network layer of the target detection model, selecting a target maximum value from multiple groups of adjusted maximum values according to the Nth network layer and network layers behind the Nth network layer and output values respectively corresponding to the Nth network layer after each quantization and before the quantization;
and using the target maximum value for an Nth network layer of a quantitative classification model.
In a possible embodiment, the adjusting the maximum value of all the value ranges of the known parameter to be quantized of the first target network layer includes:
determining the value ranges of all known parameters to be quantized according to the initial most values of all known parameters to be quantized of the first target network layer;
reducing the value ranges of all known parameters to be quantized according to a preset step length;
and determining the most value of each group after adjustment according to the value range after each reduction.
In a possible implementation, the quantizing the first target network layer according to the multiple sets of adjusted final values respectively includes:
calculating a quantization step length and a quantization zero point according to each group of adjusted maximum values;
and quantizing the parameter to be quantized of the Nth network layer based on the quantization step and the quantization zero point.
In a possible embodiment, when the first target network layer is an nth network layer of the target detection model, selecting the target maximum value from the plurality of sets of adjusted maximum values according to the nth network layer and network layers after the nth network layer and output values corresponding to the nth network layer after quantization and before quantization respectively, includes:
under the condition that the first target network layer is the Nth network layer of the target detection model, for each group of adjusted maximum values for quantizing the Nth network layer, sequentially calculating output values of the Nth network layer and the network layers behind the Nth network layer before and after quantization of the Nth network layer according to the sequence of the network layers in the target detection model;
respectively calculating loss values of the Nth network layer and the network layers behind the Nth network layer according to the output values of the Nth network layer and the network layers behind the Nth network layer before and after quantization;
sequencing the N network layer and the loss values of the network layers behind the N network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;
taking the inflection point of the loss value as the termination point of the quantization process of the current group adjusted maximum value to the Nth network layer;
and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted minimum values of each group, and taking the adjusted minimum value with the minimum loss value as a target maximum value.
In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a quantization model, including:
the first quantization module is used for quantizing a pre-trained target detection model;
the first determining module is used for determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;
the second determining module is used for grouping all the network layers with the contribution degrees smaller than a preset threshold value and determining the network layer of each group;
and the second quantization module is used for performing parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and a processor for realizing the method for generating the quantization model when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for generating a quantization model described above.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure at least has part or all of the following advantages:
the method for generating the quantitative model, which is disclosed by the embodiment of the disclosure, quantifies a pre-trained target detection model; determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization; grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group; the network layers of each group and the network layers with the contribution degree larger than or equal to the preset threshold are quantized in parallel to obtain the quantization model corresponding to the target detection model, all the network layers are divided according to the contribution degree of each network layer, all the network layers with the contribution degree smaller than the preset threshold are grouped, a plurality of network layers with the contribution degree smaller than the preset threshold of each group and a single network layer with the contribution degree larger than or equal to the preset threshold are quantized in parallel, and the speed of the quantization process can be improved while the precision of the quantization process is ensured.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 schematically illustrates a flow diagram of a method of generating a quantization model according to an embodiment of the present disclosure;
FIG. 2 schematically shows a schematic diagram of a distribution of contribution degrees of a network layer of a model according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a sliding window schematic in a quantization process according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of an apparatus for generation of a quantization model according to an embodiment of the present disclosure; and
fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Referring to fig. 1, an embodiment of the present disclosure provides a method for generating a quantization model, the method including:
s1, quantizing a pre-trained target detection model;
in some embodiments, the pre-trained target detection model may be an image classification model or a face recognition model, among others.
S2, determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;
in some embodiments, in the case where the object detection model includes 50 network layers, the contribution degree distribution of the 50 network layers is as shown in fig. 2. In practical applications, the number of network layers in the target detection model is not limited.
S3, grouping all the network layers with the contribution degrees smaller than a preset threshold value, and determining the network layer of each group;
in some embodiments, the preset threshold may be a preset value or an average value of the calculated contribution degrees of all network layers.
And S4, carrying out parallel quantization on each group of network layers and each network layer with the contribution degree larger than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.
In this embodiment, in step S2, the contribution of each network layer to the target detection model is determined according to the difference between the output values of each network layer of the target detection model before quantization and after quantization by the following expression:
Figure 154097DEST_PATH_IMAGE001
wherein S is the contribution degree of the current network layer to the target detection model,
Figure 395722DEST_PATH_IMAGE004
for the ith output value before quantization of the current network layer,
Figure 733163DEST_PATH_IMAGE005
and the ith output value quantized for the current network layer.
In this embodiment, in step S1, the quantifying the pre-trained target detection model includes:
and for each weight factor of each network layer of the pre-trained target detection model, adjusting the current bit to be a target bit, wherein the target bit is smaller than the current bit. In practical applications, the target bit may be 4 bits.
In this embodiment, in step S3, the grouping all the network layers whose contribution degrees are smaller than the preset threshold, and determining the network layer of each group includes:
sequencing all network layers with contribution degrees smaller than a preset threshold value according to the sequence of the network layers in the target detection model to obtain sequenced network layers;
and grouping the sequenced network layers, and determining the network layers of each group, wherein the number of the network layers of each group is a preset value.
In this embodiment, in step S4, the performing parallel quantization on each group of network layers and each network layer whose contribution degree is greater than or equal to the preset threshold includes:
distributing a parallel thread to each network layer with the contribution degree greater than or equal to a preset threshold value respectively, and quantifying by taking each network layer with the contribution degree greater than or equal to the preset threshold value as a first target network layer;
and respectively allocating a parallel thread to each group of network layers, and quantifying each network layer in each group of network layers as a second target network layer.
In this embodiment, the quantizing each network layer in each group of network layers as the second target network layer includes:
taking each network layer in each group of network layers as a second target network layer;
and quantizing the second target network layer in sequence according to the sequence of each group of network layers in the target detection model.
In this embodiment, the quantizing the second target network layer includes:
adjusting the most values of the value ranges of all known parameters to be quantized of the second target network layer, and quantizing the second target network layer according to a plurality of groups of adjusted most values;
under the condition that the second target network layer is the Mth network layer of the target detection model, for each group of adjusted minimum values for quantizing the Mth network layer, sequentially calculating output values of the Mth network layer to a third target network layer according to the sequence of the network layers in the target detection model, wherein each network layer is the first network layer behind the Mth network layer in the sequenced network layers;
respectively calculating the loss value of each network layer from the Mth network layer to the third target network layer according to the output value of each network layer from the Mth network layer to the third target network layer before and after quantization of each network layer;
sequencing the loss values of each network layer from the Mth network layer to the third target network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;
taking the inflection point of the loss value as the termination point of the quantization process of the M network layer by the adjusted maximum value of the current group;
and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted minimum values of each group, and taking the adjusted minimum value with the minimum loss value as a target maximum value.
As shown in fig. 2, the sorted network layers may be 2,3,5,6,7,8 \8230;. Assuming that M is 3, it is only necessary to calculate the loss values of the 3 rd network layer and the 4 th network layer and to take the smaller loss value of the loss values of the 3 rd network layer and the 4 th network layer as the inflection point of the loss value.
In some embodiments, the loss value of each of the mth network layer to the third target network layer is calculated according to the following expression, respectively, from the output values of each of the mth network layer to the third target network layer before quantization and after quantization of the mth network layer:
Figure 376634DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 711800DEST_PATH_IMAGE007
is the loss value of the current network layer, n is the number of the output values of the current network layer,
Figure 752306DEST_PATH_IMAGE008
for the ith output value of the current network layer before quantization at the mth network layer,
Figure 831121DEST_PATH_IMAGE009
and the ith output value of the current network layer after the M network layer quantization is obtained.
In some embodiments, the loss value inflection point may be determined by:
drawing a relation curve of the loss value-network layer number, wherein the loss value is a vertical coordinate, and the network layer number is a horizontal coordinate;
fitting a corresponding relation function of the loss value-network layer number relation curve;
and taking the corresponding loss value when the derivative of the relation function is 0 as a loss value inflection point.
In some embodiments, each set of adjusted final values for quantizing the 1 st network layer corresponds to one quantization process, and in each quantization process, in calculating output values of each network layer from the 1 st network layer to a third target network layer in sequence, each network layer before quantization and after quantization of the 1 st network layer may be implemented in a sliding window manner, specifically, when calculating output values of the 1 st network layer before quantization and after quantization of the 1 st network layer, only the 1 st network layer (layer-1) is included in a sliding window framed by a dashed line; when calculating the output values of the 2 nd network layer before and after quantization of the 1 st network layer, the sliding window enclosed by the solid line includes the 1 st network layer and the 2 nd network layers (layer-1 and layer-2), as shown in fig. 3.
In this embodiment, the quantizing each network layer whose contribution degree is greater than or equal to the preset threshold as the first target network layer includes:
taking each network layer with the contribution degree greater than or equal to a preset threshold value as a first target network layer;
adjusting the most values of all known parameter value ranges to be quantized of the first target network layer, and quantizing the first target network layer according to a plurality of groups of adjusted most values;
under the condition that the first target network layer is the Nth network layer of the target detection model, selecting a target maximum value from multiple groups of adjusted maximum values according to the Nth network layer and network layers behind the Nth network layer and output values respectively corresponding to the Nth network layer after each quantization and before the quantization;
and using the target maximum value for an Nth network layer of a quantitative classification model.
In this embodiment, the adjusting the maximum value of all the value ranges of the known parameter to be quantized of the first target network layer includes:
determining the value ranges of all known parameters to be quantized according to the initial most values of all known parameters to be quantized of the first target network layer;
narrowing the value ranges of all known parameters to be quantized according to a preset step length;
and determining the most value of each group after adjustment according to the value range after each reduction.
In practical application, the preset step length may be a preset value, or may be 10% of a difference between a maximum value and a minimum value in the value range.
In this embodiment, the quantizing the first target network layer according to the multiple groups of adjusted final values respectively includes:
calculating a quantization step length and a quantization zero point according to each group of adjusted maximum values;
and quantizing the parameter to be quantized of the Nth network layer based on the quantization step and the quantization zero point.
In practical application, the quantization step is calculated by the following expression:
s=[max(x)-min(x)]/(2^bit-1)
wherein s is the quantization step, max (x) is the maximum value of the adjusted maximum value of each group, min (x) is the minimum value of the adjusted maximum value of each group, and bit is the target bit.
In practical applications, the quantization zero is calculated by the following expression:
z=round(-min(x)/s )
wherein z is a quantization zero point, min (x) is the minimum value of the adjusted minimum value of each group, and s is a quantization step length.
In practical application, the parameter to be quantized of the nth network layer is quantized based on the quantization step and the quantization zero point by the following expression:
Figure 329098DEST_PATH_IMAGE010
Figure 100745DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 113700DEST_PATH_IMAGE012
and x is the weight factor after quantization, s is the quantization step length, z is the quantization zero point, and b is the number of the weight factors.
In this embodiment, when the first target network layer is an nth network layer of the target detection model, selecting the target maximum value from the plurality of sets of adjusted maximum values according to the nth network layer and network layers after the nth network layer, and output values corresponding to the nth network layer after each quantization and before the quantization respectively, includes:
under the condition that the first target network layer is the Nth network layer of the target detection model, for each group of adjusted minimum values for quantizing the Nth network layer, sequentially calculating the Nth network layer and output values of the network layers behind the Nth network layer before and after quantization of the Nth network layer according to the sequence of the network layers in the target detection model;
respectively calculating loss values of the Nth network layer and the network layers behind the Nth network layer according to the Nth network layer and the output values of the network layers behind the Nth network layer before and after quantization;
sequencing the N network layer and the loss values of the network layers behind the N network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;
taking the inflection point of the loss value as the termination point of the quantization process of the current group adjusted maximum value to the Nth network layer;
and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted minimum values of each group, and taking the adjusted minimum value with the minimum loss value as a target maximum value.
The method provides a sliding window mode, guides the model quantization, provides a dynamic quantization mode aiming at the problem of long processing time of the sliding window, adopts a multi-thread synchronous quantization mode for a network layer with low contribution degree, and improves the quantization efficiency of the sliding window, thereby balancing the precision of the quantized model and the speed of the quantization process.
In the prior art, traversing from the current layer to the last layer is adopted for the sliding window of each network layer, and the consumed time complexity is T × N!, wherein T represents the time of searching one layer, and N! represents the factorial of N.
Referring to fig. 4, an embodiment of the present disclosure provides a generation apparatus of a quantization model, including:
a first quantization module 41, configured to quantize a pre-trained target detection model;
a first determining module 42, configured to determine a contribution degree of each network layer to the target detection model according to a difference between output values of each network layer of the target detection model before quantization and output values of each network layer after quantization;
a second determining module 43, configured to group all network layers whose contribution degrees are smaller than a preset threshold, and determine a network layer of each group;
and a second quantization module 44, configured to perform parallel quantization on each group of network layers and each network layer whose contribution is greater than or equal to a preset threshold, so as to obtain a quantization model corresponding to the target detection model.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
In the second embodiment, any number of the first quantization module 41, the first determination module 42, the second determination module 43, and the second quantization module 44 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. At least one of the first quantization module 41, the first determination module 42, the second determination module 43 and the second quantization module 44 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first quantization module 41, the first determination module 42, the second determination module 43 and the second quantization module 44 may be at least partly implemented as a computer program module, which when executed may perform a corresponding function.
Referring to fig. 5, an electronic device provided by an embodiment of the present disclosure includes a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140, where the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the communication bus 1140;
a memory 1130 for storing computer programs;
the processor 1110, when executing the program stored in the memory 1130, implements a method of generating a quantization model as follows:
quantizing a pre-trained target detection model;
determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;
grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group;
and carrying out parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.
The communication bus 1140 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface 1120 is used for communication between the electronic device and other devices.
The Memory 1130 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 1130 may also be at least one memory device located remotely from the processor 1110.
The Processor 1110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
Embodiments of the present disclosure also provide a computer-readable storage medium. The above-mentioned computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of generating a quantization model as described above.
The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The above-described computer-readable storage medium carries one or more programs which, when executed, implement a method of generating a quantization model according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method of generating a quantization model, the method comprising:
quantizing a pre-trained target detection model;
determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;
grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group;
and carrying out parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.
2. The method of claim 1, wherein the contribution of each network layer to the target detection model is determined according to a difference between pre-quantization and post-quantization output values of each network layer of the target detection model by the following expression:
Figure 360388DEST_PATH_IMAGE001
wherein S is the contribution degree of the current network layer to the target detection model,
Figure 353752DEST_PATH_IMAGE002
for the ith output value before quantization of the current network layer,
Figure 108082DEST_PATH_IMAGE003
and the ith output value quantized for the current network layer.
3. The method of claim 1, wherein quantifying the pre-trained object detection model comprises:
and for each weight factor of each network layer of the pre-trained target detection model, adjusting the current bit to be a target bit, wherein the target bit is smaller than the current bit.
4. The method according to claim 1, wherein the grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group comprises:
sequencing all network layers with contribution degrees smaller than a preset threshold value according to the sequence of the network layers in the target detection model to obtain sequenced network layers;
and grouping the sequenced network layers, and determining the network layers of each group, wherein the number of the network layers of each group is a preset value.
5. The method according to claim 1, wherein the performing parallel quantization on each group of network layers and each network layer with contribution degree greater than or equal to a preset threshold comprises:
respectively distributing a parallel thread to each network layer with the contribution degree greater than or equal to a preset threshold value, and quantifying each network layer with the contribution degree greater than or equal to the preset threshold value as a first target network layer;
and respectively allocating a parallel thread to each group of network layers, and quantifying each network layer in each group of network layers as a second target network layer.
6. The method of claim 5, wherein quantizing each network layer of each group as a second target network layer comprises:
taking each network layer in each group of network layers as a second target network layer;
and quantizing the second target network layer in sequence according to the sequence of each group of network layers in the target detection model.
7. The method of claim 6, wherein quantizing the second target network layer comprises:
adjusting the most values of the value ranges of all known parameters to be quantized of the second target network layer, and quantizing the second target network layer according to a plurality of groups of adjusted most values;
under the condition that the second target network layer is the Mth network layer of the target detection model, for each group of adjusted minimum values for quantizing the Mth network layer, sequentially calculating output values of the Mth network layer to a third target network layer according to the sequence of the network layers in the target detection model, wherein each network layer is the first network layer behind the Mth network layer in the sequenced network layers;
respectively calculating the loss value of each network layer from the Mth network layer to the third target network layer according to the output value of each network layer from the Mth network layer to the third target network layer before and after quantization of each network layer;
sequencing the loss values of each network layer from the Mth network layer to the third target network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;
taking the inflection point of the loss value as the termination point of the quantization process of the M network layer by the adjusted maximum value of the current group;
and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted minimum values of each group, and taking the adjusted minimum value with the minimum loss value as a target maximum value.
8. The method according to claim 5, wherein quantizing each network layer with contribution degree greater than or equal to a preset threshold as a first target network layer comprises:
taking each network layer with the contribution degree greater than or equal to a preset threshold value as a first target network layer;
adjusting the most values of all known parameter value ranges to be quantized of the first target network layer, and quantizing the first target network layer according to a plurality of groups of adjusted most values;
under the condition that the first target network layer is the Nth network layer of the target detection model, selecting a target maximum value from multiple groups of adjusted maximum values according to the Nth network layer and network layers behind the Nth network layer and output values which correspond to the Nth network layer after quantization and before quantization respectively;
and using the target maximum value for an Nth network layer of a quantitative classification model.
9. The method of claim 8, wherein the adjusting the maximum value of all the ranges of the known parameters to be quantized of the first target network layer comprises:
determining the value ranges of all known parameters to be quantized according to the initial most values of all known parameters to be quantized of the first target network layer;
reducing the value ranges of all known parameters to be quantized according to a preset step length;
and determining the most value of each group after adjustment according to the value range after each reduction.
10. The method of claim 8, wherein the quantizing the first target network layers according to the plurality of sets of adjusted final values respectively comprises:
calculating a quantization step length and a quantization zero point according to each group of adjusted maximum values;
and quantizing the parameter to be quantized of the Nth network layer based on the quantization step and the quantization zero point.
11. The method according to claim 8, wherein, in the case that the first target network layer is an nth network layer of the target detection model, selecting the target maximum value from the plurality of sets of adjusted maximum values according to the nth network layer and network layers after the nth network layer, and output values corresponding to the nth network layer after each quantization and before the quantization respectively comprises:
under the condition that the first target network layer is the Nth network layer of the target detection model, for each group of adjusted minimum values for quantizing the Nth network layer, sequentially calculating the Nth network layer and output values of the network layers behind the Nth network layer before and after quantization of the Nth network layer according to the sequence of the network layers in the target detection model;
respectively calculating loss values of the Nth network layer and the network layers behind the Nth network layer according to the output values of the Nth network layer and the network layers behind the Nth network layer before and after quantization;
sequencing the loss values of the Nth network layer and the network layers behind the Nth network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;
taking the inflection point of the loss value as the termination point of the quantization process of the current group adjusted maximum value to the Nth network layer;
and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted minimum values of each group, and taking the adjusted minimum value with the minimum loss value as a target maximum value.
12. An apparatus for generating a quantization model, comprising:
the first quantization module is used for quantizing a pre-trained target detection model;
the first determining module is used for determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;
the second determining module is used for grouping all the network layers with the contribution degrees smaller than a preset threshold value and determining the network layer of each group;
and the second quantization module is used for performing parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of generating a quantitative model as claimed in any one of claims 1 to 11 when executing a program stored in a memory.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of generating a quantization model according to any one of claims 1-11.
CN202310005904.2A 2023-01-04 2023-01-04 Quantization model generation method and device, electronic equipment and storage medium Active CN115759238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310005904.2A CN115759238B (en) 2023-01-04 2023-01-04 Quantization model generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310005904.2A CN115759238B (en) 2023-01-04 2023-01-04 Quantization model generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115759238A true CN115759238A (en) 2023-03-07
CN115759238B CN115759238B (en) 2023-08-11

Family

ID=85348182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310005904.2A Active CN115759238B (en) 2023-01-04 2023-01-04 Quantization model generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115759238B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739039A (en) * 2023-05-05 2023-09-12 北京百度网讯科技有限公司 Quantization method, device, equipment and medium of distributed deployment model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738401A (en) * 2019-03-25 2020-10-02 北京三星通信技术研究有限公司 Model optimization method, grouping compression method, corresponding device and equipment
CN112926570A (en) * 2021-03-26 2021-06-08 上海交通大学 Adaptive bit network quantization method, system and image processing method
CN113673532A (en) * 2021-10-21 2021-11-19 北京科技大学 Optimization method and device of quantization model
US20220309321A1 (en) * 2021-03-24 2022-09-29 Panasonic Intellectual Property Management Co., Ltd. Quantization method, quantization device, and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738401A (en) * 2019-03-25 2020-10-02 北京三星通信技术研究有限公司 Model optimization method, grouping compression method, corresponding device and equipment
US20220309321A1 (en) * 2021-03-24 2022-09-29 Panasonic Intellectual Property Management Co., Ltd. Quantization method, quantization device, and recording medium
CN112926570A (en) * 2021-03-26 2021-06-08 上海交通大学 Adaptive bit network quantization method, system and image processing method
CN113673532A (en) * 2021-10-21 2021-11-19 北京科技大学 Optimization method and device of quantization model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭黎利;高飞;: "基于渐进性能分析的最优分布式量化检测算法", 中南大学学报(自然科学版), no. 12, pages 4529 - 4534 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739039A (en) * 2023-05-05 2023-09-12 北京百度网讯科技有限公司 Quantization method, device, equipment and medium of distributed deployment model

Also Published As

Publication number Publication date
CN115759238B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN109840589B (en) Method and device for operating convolutional neural network on FPGA
CN111444009B (en) Resource allocation method and device based on deep reinforcement learning
CN106855952B (en) Neural network-based computing method and device
TW202119293A (en) Method and system of quantizing artificial neural network and arti ficial neural network apparatus
CN111625816A (en) Intrusion detection method and device
CN111091184B (en) Deep neural network quantification method and device, electronic equipment and medium
CN113705775A (en) Neural network pruning method, device, equipment and storage medium
US20200184245A1 (en) Improper neural network input detection and handling
CN115759238A (en) Method and device for generating quantization model, electronic equipment and storage medium
CN111881972B (en) Black-out user identification method and device, server and storage medium
CN111563589A (en) Quantification method and device of neural network model
CN113449854A (en) Method and device for quantifying mixing precision of network model and computer storage medium
CN112150497A (en) Local activation method and system based on binary neural network
WO2021012148A1 (en) Data processing method and apparatus based on deep neural network, and mobile device
CN111027684A (en) Deep learning model quantification method and device, electronic equipment and storage medium
CN111159169B (en) Data management method and equipment
CN113408704A (en) Data processing method, device, equipment and computer readable storage medium
CN112561050B (en) Neural network model training method and device
CN114841325A (en) Data processing method and medium of neural network model and electronic device
CN114662485A (en) Translation model compression method, translation method and related device
CN111598233A (en) Compression method, device and equipment of deep learning model
CN112668702B (en) Fixed-point parameter optimization method, system, terminal and storage medium
CN111290850B (en) Data storage method, device and equipment
CN117348837A (en) Quantization method and device for floating point precision model, electronic equipment and storage medium
CN115630692A (en) Method and device for adjusting calculation precision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant