CN115759238A

CN115759238A - Method and device for generating quantization model, electronic equipment and storage medium

Info

Publication number: CN115759238A
Application number: CN202310005904.2A
Authority: CN
Inventors: 刘艳; 林金辉; 王恒; 石宇航; 孙梦磊; 杨思琪
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-03-07
Anticipated expiration: 2043-01-04
Also published as: CN115759238B

Abstract

The disclosure relates to a method and a device for generating a quantization model, an electronic device and a storage medium, wherein the method comprises the following steps: quantizing a pre-trained target detection model; determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization; grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group; and carrying out parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model, carrying out parallel quantization on a plurality of network layers with the contribution degree smaller than the preset threshold value and a single network layer with the contribution degree greater than or equal to the preset threshold value of each group, ensuring the precision of the quantization process by carrying out independent quantization on the network layers with the larger contribution degree, and improving the speed of the quantization process in a parallel quantization mode.

Description

Method and device for generating quantization model, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of neural network technologies, and in particular, to a method and an apparatus for generating a quantization model, an electronic device, and a storage medium.

Background

In the parameter quantization method adopted for image classification model compression, the following problems exist: image classification model quantization has less data to effectively represent under a low bit constraint, but the precision of a quantized model is greatly lost, wherein the low bit can be 4-bit.

In the related technology, a quantization mode of block reconstruction is adopted, and parameters in corresponding volume blocks are updated by using an output result of each block in a model, so that a mode of combining local information and global information is achieved; a method of quantization combined with knowledge distillation can also be used to guide the quantization process with the output of each convolution layer and the output of the activation function.

However, in the quantization process, only a layer-by-layer or block-by-block quantization mode is adopted, and the influence of local information and global information on quantization is not well balanced, so that the precision loss is serious.

Therefore, in the deep neural network model quantization process, reducing the lost precision under the low bit constraint is a problem to be solved urgently.

Disclosure of Invention

In order to solve the technical problem described above or at least partially solve the technical problem described above, embodiments of the present disclosure provide a method and an apparatus for generating a quantization model, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for generating a quantization model, where the method includes:

quantizing a pre-trained target detection model;

determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;

grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group;

and carrying out parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.

In a possible implementation manner, the contribution degree of each network layer to the target detection model is determined according to the difference value of the output values of each network layer of the target detection model before quantization and after quantization by the following expression:

wherein S is the contribution degree of the current network layer to the target detection model,

for the ith output value before quantization of the current network layer,

and the ith output value quantized for the current network layer.

In one possible embodiment, the quantifying the pre-trained object detection model includes:

and for each weight factor of each network layer of the pre-trained target detection model, adjusting the current bit to be a target bit, wherein the target bit is smaller than the current bit.

In a possible implementation manner, the grouping all the network layers whose contribution degrees are smaller than a preset threshold, and determining the network layer of each group includes:

sequencing all network layers with contribution degrees smaller than a preset threshold value according to the sequence of the network layers in the target detection model to obtain sequenced network layers;

and grouping the sequenced network layers, and determining the network layers of each group, wherein the number of the network layers of each group is a preset value.

In a possible embodiment, the performing parallel quantization on each group of network layers and each network layer whose contribution degree is greater than or equal to a preset threshold includes:

distributing a parallel thread to each network layer with the contribution degree greater than or equal to a preset threshold value respectively, and quantifying by taking each network layer with the contribution degree greater than or equal to the preset threshold value as a first target network layer;

and respectively allocating a parallel thread to each group of network layers, and quantifying each network layer in each group of network layers as a second target network layer.

In one possible embodiment, the quantizing each network layer of each group as the second target network layer includes:

taking each network layer in each group of network layers as a second target network layer;

and quantizing the second target network layer in sequence according to the sequence of each group of network layers in the target detection model.

In one possible implementation, the quantizing the second target network layer includes:

adjusting the most values of the value ranges of all known parameters to be quantized of the second target network layer, and quantizing the second target network layer according to a plurality of groups of adjusted most values;

under the condition that the second target network layer is the Mth network layer of the target detection model, for each group of adjusted final values for quantizing the Mth network layer, sequentially calculating output values of each network layer from the Mth network layer to a third target network layer in the Mth network layer before and after quantization according to the sequence of the network layers in the target detection model, wherein the third target network layer is the first network layer behind the Mth network layer in the sequenced network layers;

respectively calculating the loss value of each network layer from the Mth network layer to the third target network layer according to the output value of each network layer from the Mth network layer to the third target network layer before and after quantization of each network layer;

sequencing the loss values of each network layer from the Mth network layer to the third target network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;

taking the inflection point of the loss value as the termination point of the quantization process of the M network layer by the adjusted latest value of the current group;

and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted maximum values of each group, and taking the adjusted maximum value with the minimum loss value as a target maximum value.

In a possible implementation, the quantizing each network layer whose contribution degree is greater than or equal to a preset threshold as a first target network layer includes:

taking each network layer with the contribution degree greater than or equal to a preset threshold value as a first target network layer;

adjusting the most values of all known parameter value ranges to be quantized of the first target network layer, and quantizing the first target network layer according to a plurality of groups of adjusted most values;

under the condition that the first target network layer is the Nth network layer of the target detection model, selecting a target maximum value from multiple groups of adjusted maximum values according to the Nth network layer and network layers behind the Nth network layer and output values respectively corresponding to the Nth network layer after each quantization and before the quantization;

and using the target maximum value for an Nth network layer of a quantitative classification model.

In a possible embodiment, the adjusting the maximum value of all the value ranges of the known parameter to be quantized of the first target network layer includes:

determining the value ranges of all known parameters to be quantized according to the initial most values of all known parameters to be quantized of the first target network layer;

reducing the value ranges of all known parameters to be quantized according to a preset step length;

and determining the most value of each group after adjustment according to the value range after each reduction.

In a possible implementation, the quantizing the first target network layer according to the multiple sets of adjusted final values respectively includes:

calculating a quantization step length and a quantization zero point according to each group of adjusted maximum values;

and quantizing the parameter to be quantized of the Nth network layer based on the quantization step and the quantization zero point.

In a possible embodiment, when the first target network layer is an nth network layer of the target detection model, selecting the target maximum value from the plurality of sets of adjusted maximum values according to the nth network layer and network layers after the nth network layer and output values corresponding to the nth network layer after quantization and before quantization respectively, includes:

under the condition that the first target network layer is the Nth network layer of the target detection model, for each group of adjusted maximum values for quantizing the Nth network layer, sequentially calculating output values of the Nth network layer and the network layers behind the Nth network layer before and after quantization of the Nth network layer according to the sequence of the network layers in the target detection model;

respectively calculating loss values of the Nth network layer and the network layers behind the Nth network layer according to the output values of the Nth network layer and the network layers behind the Nth network layer before and after quantization;

sequencing the N network layer and the loss values of the network layers behind the N network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;

taking the inflection point of the loss value as the termination point of the quantization process of the current group adjusted maximum value to the Nth network layer;

and comparing the loss values corresponding to the loss value inflection points corresponding to the adjusted minimum values of each group, and taking the adjusted minimum value with the minimum loss value as a target maximum value.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a quantization model, including:

the first quantization module is used for quantizing a pre-trained target detection model;

the first determining module is used for determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;

the second determining module is used for grouping all the network layers with the contribution degrees smaller than a preset threshold value and determining the network layer of each group;

and the second quantization module is used for performing parallel quantization on each group of network layers and each network layer with the contribution degree greater than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and a processor for realizing the method for generating the quantization model when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for generating a quantization model described above.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure at least has part or all of the following advantages:

the method for generating the quantitative model, which is disclosed by the embodiment of the disclosure, quantifies a pre-trained target detection model; determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization; grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group; the network layers of each group and the network layers with the contribution degree larger than or equal to the preset threshold are quantized in parallel to obtain the quantization model corresponding to the target detection model, all the network layers are divided according to the contribution degree of each network layer, all the network layers with the contribution degree smaller than the preset threshold are grouped, a plurality of network layers with the contribution degree smaller than the preset threshold of each group and a single network layer with the contribution degree larger than or equal to the preset threshold are quantized in parallel, and the speed of the quantization process can be improved while the precision of the quantization process is ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 schematically illustrates a flow diagram of a method of generating a quantization model according to an embodiment of the present disclosure;

FIG. 2 schematically shows a schematic diagram of a distribution of contribution degrees of a network layer of a model according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a sliding window schematic in a quantization process according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a block diagram of an apparatus for generation of a quantization model according to an embodiment of the present disclosure; and

fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Referring to fig. 1, an embodiment of the present disclosure provides a method for generating a quantization model, the method including:

s1, quantizing a pre-trained target detection model;

in some embodiments, the pre-trained target detection model may be an image classification model or a face recognition model, among others.

S2, determining the contribution degree of each network layer to the target detection model according to the difference value of the output values of each network layer of the target detection model before and after quantization;

in some embodiments, in the case where the object detection model includes 50 network layers, the contribution degree distribution of the 50 network layers is as shown in fig. 2. In practical applications, the number of network layers in the target detection model is not limited.

S3, grouping all the network layers with the contribution degrees smaller than a preset threshold value, and determining the network layer of each group;

in some embodiments, the preset threshold may be a preset value or an average value of the calculated contribution degrees of all network layers.

And S4, carrying out parallel quantization on each group of network layers and each network layer with the contribution degree larger than or equal to a preset threshold value to obtain a quantization model corresponding to the target detection model.

In this embodiment, in step S2, the contribution of each network layer to the target detection model is determined according to the difference between the output values of each network layer of the target detection model before quantization and after quantization by the following expression:

for the ith output value before quantization of the current network layer,

and the ith output value quantized for the current network layer.

In this embodiment, in step S1, the quantifying the pre-trained target detection model includes:

and for each weight factor of each network layer of the pre-trained target detection model, adjusting the current bit to be a target bit, wherein the target bit is smaller than the current bit. In practical applications, the target bit may be 4 bits.

In this embodiment, in step S3, the grouping all the network layers whose contribution degrees are smaller than the preset threshold, and determining the network layer of each group includes:

In this embodiment, in step S4, the performing parallel quantization on each group of network layers and each network layer whose contribution degree is greater than or equal to the preset threshold includes:

In this embodiment, the quantizing each network layer in each group of network layers as the second target network layer includes:

In this embodiment, the quantizing the second target network layer includes:

under the condition that the second target network layer is the Mth network layer of the target detection model, for each group of adjusted minimum values for quantizing the Mth network layer, sequentially calculating output values of the Mth network layer to a third target network layer according to the sequence of the network layers in the target detection model, wherein each network layer is the first network layer behind the Mth network layer in the sequenced network layers;

taking the inflection point of the loss value as the termination point of the quantization process of the M network layer by the adjusted maximum value of the current group;

As shown in fig. 2, the sorted network layers may be 2,3,5,6,7,8 \8230;. Assuming that M is 3, it is only necessary to calculate the loss values of the 3 rd network layer and the 4 th network layer and to take the smaller loss value of the loss values of the 3 rd network layer and the 4 th network layer as the inflection point of the loss value.

In some embodiments, the loss value of each of the mth network layer to the third target network layer is calculated according to the following expression, respectively, from the output values of each of the mth network layer to the third target network layer before quantization and after quantization of the mth network layer:

wherein the content of the first and second substances,

is the loss value of the current network layer, n is the number of the output values of the current network layer,

for the ith output value of the current network layer before quantization at the mth network layer,

and the ith output value of the current network layer after the M network layer quantization is obtained.

In some embodiments, the loss value inflection point may be determined by:

drawing a relation curve of the loss value-network layer number, wherein the loss value is a vertical coordinate, and the network layer number is a horizontal coordinate;

fitting a corresponding relation function of the loss value-network layer number relation curve;

and taking the corresponding loss value when the derivative of the relation function is 0 as a loss value inflection point.

In some embodiments, each set of adjusted final values for quantizing the 1 st network layer corresponds to one quantization process, and in each quantization process, in calculating output values of each network layer from the 1 st network layer to a third target network layer in sequence, each network layer before quantization and after quantization of the 1 st network layer may be implemented in a sliding window manner, specifically, when calculating output values of the 1 st network layer before quantization and after quantization of the 1 st network layer, only the 1 st network layer (layer-1) is included in a sliding window framed by a dashed line; when calculating the output values of the 2 nd network layer before and after quantization of the 1 st network layer, the sliding window enclosed by the solid line includes the 1 st network layer and the 2 nd network layers (layer-1 and layer-2), as shown in fig. 3.

In this embodiment, the quantizing each network layer whose contribution degree is greater than or equal to the preset threshold as the first target network layer includes:

In this embodiment, the adjusting the maximum value of all the value ranges of the known parameter to be quantized of the first target network layer includes:

narrowing the value ranges of all known parameters to be quantized according to a preset step length;

In practical application, the preset step length may be a preset value, or may be 10% of a difference between a maximum value and a minimum value in the value range.

In this embodiment, the quantizing the first target network layer according to the multiple groups of adjusted final values respectively includes:

In practical application, the quantization step is calculated by the following expression:

s=[max(x)-min(x)]/(2^bit-1)

wherein s is the quantization step, max (x) is the maximum value of the adjusted maximum value of each group, min (x) is the minimum value of the adjusted maximum value of each group, and bit is the target bit.

In practical applications, the quantization zero is calculated by the following expression:

z=round(-min(x)/s )

wherein z is a quantization zero point, min (x) is the minimum value of the adjusted minimum value of each group, and s is a quantization step length.

In practical application, the parameter to be quantized of the nth network layer is quantized based on the quantization step and the quantization zero point by the following expression:

wherein the content of the first and second substances,

and x is the weight factor after quantization, s is the quantization step length, z is the quantization zero point, and b is the number of the weight factors.

In this embodiment, when the first target network layer is an nth network layer of the target detection model, selecting the target maximum value from the plurality of sets of adjusted maximum values according to the nth network layer and network layers after the nth network layer, and output values corresponding to the nth network layer after each quantization and before the quantization respectively, includes:

under the condition that the first target network layer is the Nth network layer of the target detection model, for each group of adjusted minimum values for quantizing the Nth network layer, sequentially calculating the Nth network layer and output values of the network layers behind the Nth network layer before and after quantization of the Nth network layer according to the sequence of the network layers in the target detection model;

respectively calculating loss values of the Nth network layer and the network layers behind the Nth network layer according to the Nth network layer and the output values of the network layers behind the Nth network layer before and after quantization;

The method provides a sliding window mode, guides the model quantization, provides a dynamic quantization mode aiming at the problem of long processing time of the sliding window, adopts a multi-thread synchronous quantization mode for a network layer with low contribution degree, and improves the quantization efficiency of the sliding window, thereby balancing the precision of the quantized model and the speed of the quantization process.

In the prior art, traversing from the current layer to the last layer is adopted for the sliding window of each network layer, and the consumed time complexity is T × N!, wherein T represents the time of searching one layer, and N! represents the factorial of N.

Referring to fig. 4, an embodiment of the present disclosure provides a generation apparatus of a quantization model, including:

a first quantization module 41, configured to quantize a pre-trained target detection model;

a first determining module 42, configured to determine a contribution degree of each network layer to the target detection model according to a difference between output values of each network layer of the target detection model before quantization and output values of each network layer after quantization;

a second determining module 43, configured to group all network layers whose contribution degrees are smaller than a preset threshold, and determine a network layer of each group;

and a second quantization module 44, configured to perform parallel quantization on each group of network layers and each network layer whose contribution is greater than or equal to a preset threshold, so as to obtain a quantization model corresponding to the target detection model.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

In the second embodiment, any number of the first quantization module 41, the first determination module 42, the second determination module 43, and the second quantization module 44 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. At least one of the first quantization module 41, the first determination module 42, the second determination module 43 and the second quantization module 44 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first quantization module 41, the first determination module 42, the second determination module 43 and the second quantization module 44 may be at least partly implemented as a computer program module, which when executed may perform a corresponding function.

Referring to fig. 5, an electronic device provided by an embodiment of the present disclosure includes a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140, where the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the communication bus 1140;

a memory 1130 for storing computer programs;

the processor 1110, when executing the program stored in the memory 1130, implements a method of generating a quantization model as follows:

quantizing a pre-trained target detection model;

The communication bus 1140 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface 1120 is used for communication between the electronic device and other devices.

The Memory 1130 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 1130 may also be at least one memory device located remotely from the processor 1110.

The Processor 1110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

Embodiments of the present disclosure also provide a computer-readable storage medium. The above-mentioned computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of generating a quantization model as described above.

The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The above-described computer-readable storage medium carries one or more programs which, when executed, implement a method of generating a quantization model according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of generating a quantization model, the method comprising:

quantizing a pre-trained target detection model;

2. The method of claim 1, wherein the contribution of each network layer to the target detection model is determined according to a difference between pre-quantization and post-quantization output values of each network layer of the target detection model by the following expression:

for the ith output value before quantization of the current network layer,

and the ith output value quantized for the current network layer.

3. The method of claim 1, wherein quantifying the pre-trained object detection model comprises:

4. The method according to claim 1, wherein the grouping all network layers with contribution degrees smaller than a preset threshold value, and determining the network layer of each group comprises:

5. The method according to claim 1, wherein the performing parallel quantization on each group of network layers and each network layer with contribution degree greater than or equal to a preset threshold comprises:

respectively distributing a parallel thread to each network layer with the contribution degree greater than or equal to a preset threshold value, and quantifying each network layer with the contribution degree greater than or equal to the preset threshold value as a first target network layer;

6. The method of claim 5, wherein quantizing each network layer of each group as a second target network layer comprises:

7. The method of claim 6, wherein quantizing the second target network layer comprises:

8. The method according to claim 5, wherein quantizing each network layer with contribution degree greater than or equal to a preset threshold as a first target network layer comprises:

under the condition that the first target network layer is the Nth network layer of the target detection model, selecting a target maximum value from multiple groups of adjusted maximum values according to the Nth network layer and network layers behind the Nth network layer and output values which correspond to the Nth network layer after quantization and before quantization respectively;

9. The method of claim 8, wherein the adjusting the maximum value of all the ranges of the known parameters to be quantized of the first target network layer comprises:

10. The method of claim 8, wherein the quantizing the first target network layers according to the plurality of sets of adjusted final values respectively comprises:

11. The method according to claim 8, wherein, in the case that the first target network layer is an nth network layer of the target detection model, selecting the target maximum value from the plurality of sets of adjusted maximum values according to the nth network layer and network layers after the nth network layer, and output values corresponding to the nth network layer after each quantization and before the quantization respectively comprises:

sequencing the loss values of the Nth network layer and the network layers behind the Nth network layer according to the sequence of the network layers in the target detection model, and determining a loss value inflection point according to the sequenced loss values;

12. An apparatus for generating a quantization model, comprising:

13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method of generating a quantitative model as claimed in any one of claims 1 to 11 when executing a program stored in a memory.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of generating a quantization model according to any one of claims 1-11.