CN114897062A

CN114897062A - Target detection method, target detection device, electronic equipment, target detection medium and product

Info

Publication number: CN114897062A
Application number: CN202210459052.XA
Authority: CN
Inventors: 陆强
Original assignee: International Network Technology Shanghai Co Ltd
Current assignee: International Network Technology Shanghai Co Ltd
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-08-12

Abstract

The invention provides a target detection method, a target detection device, an electronic device, a medium and a product, wherein the target detection method comprises the following steps: performing quantitative sensitivity analysis on each network layer in the target detection model, and determining a quantitative bit width corresponding to each network layer as a mixed quantitative bit width based on the quantitative sensitivity analysis result of each network layer; the target detection model is obtained based on training data and label training corresponding to the training data; quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model; and acquiring a picture to be detected, and inputting the quantized target detection model to obtain a corresponding target detection result. The method can ensure that the target detection model obtained by final quantization has ideal performance and can realize maximum model compression, the time consumption of the whole quantization process is short, the target detection model can be ensured to be smoothly deployed to a mobile terminal or embedded equipment, and the equipment performance and the model performance can be balanced.

Description

Target detection method, target detection device, electronic equipment, target detection medium and product

Technical Field

The invention relates to the technical field of model compression, in particular to a target detection method, a target detection device, electronic equipment, a medium and a product.

Background

At present, in the process of target detection and deployment to a mobile terminal or an embedded device based on deep learning, the target detection and deployment are influenced by a plurality of factors such as performance requirements of the deployed device, frame rate influence of a sensor, complex network structure and the like, so that the problems that the device performance and the network structure complexity are difficult to balance, the power consumption is low, the network structure is light, and the real-time detection precision is high are caused. There is a need to solve the above problems through model compression.

The model quantization is a representative model compression method, and the weight, the activation value and the like of the network model are converted from floating point storage (operation) to integer storage (operation) to realize compression, so that the size and the reasoning time of the deep neural network model are reduced, and the method is suitable for most models and different hardware equipment.

According to the bit number required by the storage weight, the bit width of the existing model quantization comprises 8bit quantization, 4bit quantization, 2bit quantization, 1bit quantization and the like, although the model can be greatly compressed, the effect of the model is generally reduced because all parameters are uniformly quantized into a certain format, and the smaller the bit width of the quantized value is, the more the effect is reduced.

In order to ensure accelerated calculation during model deployment and avoid excessive effect reduction, a mixed quantization mode is adopted, that is, the bit widths of the quantized numerical values of different parameters may be different (for example, float32 is quantized to 8bit or 4bit or 2bit or 1 bit). However, in the existing mixed quantization bit width, the formats of quantization of each layer are manually specified, or a training method is adopted to continuously and iteratively search the mixed quantization formats during training. The method of artificially specifying the quantization of the network layer still causes a large precision loss, and the method of continuously and iteratively performing the hybrid quantization in the training makes the whole quantization process time-consuming and too long.

In summary, the existing model quantization has the problems of unbalanced precision loss and model compression and too long time consumption, so that the performance and efficiency of the target detection model after being deployed to the mobile terminal or the embedded device are directly affected.

Disclosure of Invention

The invention provides a target detection method, a target detection device, an electronic device, a medium and a product, which are used for solving the problems.

The invention provides a target detection method, which comprises the following steps:

training based on the training data and the labels corresponding to the training data to obtain a target detection model;

performing quantitative sensitivity analysis on each network layer in the target detection model, and determining a quantitative bit width corresponding to each network layer as a mixed quantitative bit width based on the quantitative sensitivity analysis result of each network layer; the target detection model is obtained by training based on training data and labels corresponding to the training data;

quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model;

and acquiring a picture to be detected, and inputting the quantized target detection model to obtain a corresponding target detection result.

According to a target detection method provided by the present invention, the performing a quantization sensitivity analysis on each network layer in the target detection model, and determining a quantization bit width corresponding to each network layer as a mixed quantization bit width based on a quantization sensitivity analysis result of each network layer includes:

acquiring network parameters of each network layer in the target detection model, and analyzing the dispersion degree of the network parameters to obtain a dispersion analysis result;

and on the basis of the dispersion analysis result, carrying out quantitative sensitivity analysis on each network layer in sequence, and determining the corresponding quantitative bit width of each network layer, so as to obtain a quantitative bit width set corresponding to each network layer as a mixed quantitative bit width.

According to a target detection method provided by the present invention, the obtaining of network parameters of each network layer in the target detection model and analyzing a dispersion degree of the network parameters to obtain a dispersion analysis result includes:

acquiring network parameters of each network layer in the target detection model, and calculating the variance corresponding to each network layer based on the network parameters to obtain the variance of the network layer parameters as a dispersion analysis result;

correspondingly, based on the dispersion analysis result, performing quantization sensitivity analysis on each network layer in sequence, and determining the quantization bit width corresponding to each network layer, thereby obtaining a quantization bit width set corresponding to each network layer as a mixed quantization bit width, including:

sequentially performing quantization sensitivity analysis on each network layer based on the dispersion analysis result and a predefined quantization bit width set, and determining a quantization bit width corresponding to each network layer from the predefined quantization bit width set, so as to obtain a quantization bit width set corresponding to each network layer as a mixed quantization bit width;

wherein the predefined set of quantization bit widths comprises a first quantization bit width and a second quantization bit width, the second quantization bit width being greater than the first quantization bit width.

According to a target detection method provided by the present invention, the performing quantization sensitivity analysis on each network layer in sequence based on the network layer parameter variance and a predefined quantization bit width set, and determining a quantization bit width corresponding to each network layer from the predefined quantization bit width set, so as to obtain a quantization bit width set corresponding to each network layer as a mixed quantization bit width includes:

s1, sorting the network layer parameter variances according to the sequence from big to small to obtain variance sequences, and taking the corresponding network layers as the network layers to be quantized in sequence according to the variance sequences;

s2, quantizing the network layer to be quantized by using the first quantization bit width and the second quantization bit width respectively, and correspondingly obtaining a first quantized target detection model and a second quantized target detection model;

s3, respectively evaluating the performances of the first quantized target detection model and the second quantized target detection model to correspondingly obtain a first model performance and a second model performance;

s4, determining the quantization bit width corresponding to the network layer to be quantized according to the difference value between the first model performance and the second model performance;

and S5, repeating the steps from S2 to S4 until all network layers determine the corresponding quantization bit width, so as to obtain a quantization bit width set as a mixed quantization bit width.

According to a target detection method provided by the present invention, in S4, determining a quantization bit width corresponding to a network layer to be quantized according to a difference between the first model performance and the second model performance, includes:

calculating a difference between the first model performance and the second model performance;

judging whether the difference value exceeds a preset performance difference threshold value,

determining the quantization bit width corresponding to the network layer to be quantized as a second quantization bit width under the condition that the difference value exceeds a preset performance difference threshold value;

and under the condition that the difference value does not exceed a preset performance difference threshold value, determining the quantization bit width corresponding to the network layer to be quantized as a first quantization bit width.

s10, sorting the network layer parameter variances according to the sequence from big to small to obtain variance sequences, and taking the corresponding network layers as the network layers to be quantized in sequence according to the variance sequences;

s20, quantizing the network layer to be quantized by using the first quantization bit width and the second quantization bit width respectively, and correspondingly obtaining a first quantized target detection model and a second quantized target detection model;

s30, respectively evaluating the performances of the first quantized target detection model and the second quantized target detection model to correspondingly obtain a first model performance and a second model performance;

s40, judging whether the difference value between the first model performance and the second model performance exceeds a preset performance difference threshold value,

determining the quantization bit width corresponding to the network layer to be quantized as a second quantization bit width under the condition that the difference value exceeds a preset performance difference threshold, determining a new network layer to be quantized according to the variance sequence, and repeating the steps from S20 to S40;

and under the condition that the difference value does not exceed a preset performance difference threshold value, determining the quantization bit width corresponding to the network layer to be quantized as a first quantization bit width, and setting the quantization bit width corresponding to the network layer as the first quantization bit width according to the serial number of the network layer parameter variance corresponding to the network layer to be quantized in the variance sequence, thereby determining the quantization bit widths of all the network layers and obtaining the mixed quantization bit width.

According to the target detection method provided by the present invention, the quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model includes:

and performing model quantization perception training on the target detection model based on the mixed quantization bit width, thereby obtaining a quantized target detection model.

The present invention also provides a target detection apparatus, comprising:

the mixed quantization determining module is used for performing quantization sensitivity analysis on each network layer in the target detection model, and determining a quantization bit width corresponding to each network layer as a mixed quantization bit width based on the quantization sensitivity analysis result of each network layer; the target detection model is obtained by training based on training data and labels corresponding to the training data;

the model quantization module is used for quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model;

and the target detection module is used for acquiring the picture to be detected and inputting the quantized target detection model to obtain a corresponding target detection result.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the program, any one of the above object detection methods is realized.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the object detection methods described above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements any of the object detection methods described above.

According to the target detection method, the device, the electronic equipment, the medium and the product, the quantization sensitivity analysis is performed on each network layer in the target detection model, so that the quantization bit width corresponding to the network layer is determined, the finally quantized target detection model has ideal performance and can realize maximum model compression, the time consumption of the whole quantization process is short, the target detection model can be smoothly deployed to a mobile terminal or embedded equipment, and the equipment performance and the model performance can be balanced.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a target detection method provided by an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present invention;

fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 is a schematic flow chart of a target detection method provided by an embodiment of the present invention; as shown in fig. 1, a target detection method includes the following steps:

s101, carrying out quantitative sensitivity analysis on each network layer in the target detection model, and determining a quantitative bit width corresponding to each network layer as a mixed quantitative bit width based on the quantitative sensitivity analysis result of each network layer.

And the target detection model is obtained by training based on training data and labels corresponding to the training data.

In this embodiment, the target detection model is obtained by training data corresponding to vehicle detection and a corresponding label. In other embodiments of the present invention, the target detection model may also be a food material detection model obtained by training food material training data and corresponding tags, or may also be a pedestrian detection model obtained by training pedestrian data and corresponding tags, and the type of the target detection model is not limited in the present invention.

In this step, a quantization sensitivity analysis is performed on each network layer in the target detection model to determine a quantization bit width of each network layer, and then a mixed quantization bit width composed of the quantization bit widths corresponding to each network layer is obtained.

In this embodiment, the setting of the quantization bit width includes INT16 quantization bit width and INT8 quantization bit width, INT16 quantization and INT8 quantization are performed on the first network layer in the target detection model, performance evaluation is performed on the target detection model obtained after two kinds of quantization bit widths are quantized, so as to obtain a performance evaluation value, the two performance evaluation values are compared, and it is determined whether performance degradation of the first network layer after INT16 quantization meets a predetermined value or only INT8 quantization is met, so that the quantization bit width corresponding to the first network layer is determined. And determining the quantization bit width of each network layer in the target detection model by analogy, thereby obtaining a quantization bit width group as a mixed quantization bit width.

In other embodiments of the present invention, the setting of the quantization bit width includes INT16 quantization bit width, INT8 quantization bit width, INT4 quantization bit width, INT2 quantization bit width, and INT1 quantization bit width, and the quantization bit width corresponding to each network layer in the target detection model is determined from the setting of the quantization bit width.

And S102, quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model.

In this embodiment, assuming that the hybrid quantization bit width is the quantization of the first network layer INT16, the quantization of the second network layer INT16, the quantization of the third network layer INT16, the quantization of the second network layer INT16, and the quantization of the nth network layer INT8, the parameter of the float32 type in the first network layer is directly quantized into a 16-bit type parameter, and so on, the parameter of each network layer in the whole target detection model is quantized into a corresponding 16bit or 8bit by float32, so as to obtain a quantized target detection model, and the quantized target detection model realizes the maximum limited compression, and simultaneously can ensure the model performance, thereby being conveniently applied to a device with limited computing capability, such as a smart phone or an embedded device.

In addition, the quantization of the target detection model in this step may be performed after training, or may be performed while training, which is not limited in the present invention.

S103, acquiring the picture to be detected, and inputting the quantized target detection model to obtain a corresponding target detection result.

In this embodiment, the picture to be detected is a picture of a vehicle to be detected, and after the picture of the vehicle to be detected is input into the quantized target detection model, a corresponding vehicle detection result can be obtained quickly and accurately.

The vehicle detection result comprises the type of the vehicle and the position of the vehicle in the picture. Types of vehicles include cars, trucks, vans, ambulances, buses, bicycles, tricycles, and the like.

According to the target detection method provided by the embodiment of the invention, the quantization sensitivity analysis is carried out on each network layer in the target detection model, so that the quantization bit width corresponding to the network layer is determined, the finally quantized target detection model has ideal performance and can realize maximum model compression, the time consumption of the whole quantization process is short, the target detection model can be smoothly deployed to a mobile terminal or embedded equipment, and the equipment performance and the model performance can be balanced.

Further, the performing quantization sensitivity analysis on each network layer in the target detection model, and determining a quantization bit width corresponding to each network layer as a mixed quantization bit width based on a quantization sensitivity analysis result of each network layer includes:

and acquiring network parameters of each network layer in the target detection model, and analyzing the dispersion degree of the network parameters to obtain a dispersion analysis result.

Specifically, network parameters of each network layer are acquired, for example, convolutional layer acquisition weight parameters and bias parameters, active layer acquisition activation values, and the like.

After the network parameters are obtained, the dispersion degree between the network parameters in each network layer is analyzed by taking the network layer as a unit, so that a dispersion analysis result is obtained.

Wherein, the dispersion analysis result may be one of a range, a mean, and a standard deviation.

Specifically, the sequence of the network layers for performing the quantization sensitivity analysis is determined according to the dispersion analysis result corresponding to each network layer, and the quantization sensitivity analysis is performed on the network layer with low dispersion degree.

The quantitative sensitivity analysis specifically comprises: and if the two quantization bit widths of INT16 quantization and INT8 quantization are limited, INT16 quantization and INT8 quantization are respectively used for a certain network layer, the parameter data types of other network layers are kept unchanged, then the performances corresponding to the target detection models after the two quantization bit widths are calculated, and the INT16 quantization or INT8 quantization is better selected for determining the performances of the two models.

And after the quantization bit widths of all the network layers are determined, obtaining a quantization bit width group as a mixed quantization bit width of the target detection model.

According to the target detection method provided by the embodiment of the invention, the quantitative sensitivity analysis sequence is determined according to the discrete degree of the network parameters, so that the time consumed by model quantization is reduced. In addition, the network layer is subjected to quantization sensitivity analysis, and then the quantization bit width is determined from the model performance direction, so that the model performance can still meet the actual requirement, and the maximum quantization can be performed according to the actual condition of the network layer.

Further, the obtaining of the network parameters of each network layer in the target detection model and analyzing the dispersion degree of the network parameters to obtain a dispersion analysis result includes:

and acquiring network parameters of each network layer in the target detection model, and calculating the variance corresponding to each network layer based on the network parameters to obtain the network layer parameter variance as a dispersion analysis result.

In the present embodiment, the dispersion analysis is performed by calculating the variance between the network parameters.

and sequentially performing quantization sensitivity analysis on each network layer based on the dispersion analysis result and a predefined quantization bit width set, and determining the quantization bit width corresponding to each network layer from the predefined quantization bit width set, so as to obtain the quantization bit width set corresponding to each network layer as a mixed quantization bit width.

Specifically, the sequence of the quantitative sensitivity analysis is determined according to the value of the dispersion analysis result, that is, the quantitative sensitivity analysis is performed on the network layer according to the dispersion degree. Assuming that the dispersion analysis result value corresponding to the first network layer is the largest (i.e. the variance is the largest), the first network layer is used as the first quantization sensitivity analysis object.

The method comprises the steps of respectively carrying out first quantization bit width and second quantization bit width on parameters in a first network layer, carrying out no quantization on other network layer parameters to obtain two quantized target detection models, carrying out model performance evaluation on the two quantized target detection models to obtain two performance evaluation results, and determining whether the first quantization bit width is selected for proper quantization or the second quantization bit width is selected for proper quantization according to the difference between the performance evaluation results. And obtaining the mixed quantization bit width after finishing the quantization bit width determination of all the network layers.

In the present embodiment, the first quantization bit width is INT8 quantization bit width, and the second quantization bit width is INT16 quantization bit width.

In addition, it should be noted that, in the embodiment of the present invention, the predefined quantization bit width set includes only a first quantization bit width and a second quantization bit width, and in other embodiments of the present invention, the predefined quantization bit width set may include a fourth quantization bit width INT16, a third quantization bit width INT8, a third quantization bit width INT4, a second quantization bit width INT2, and a first quantization bit width INT1, which is not limited in this respect.

The target detection method provided by the embodiment of the invention can automatically determine the corresponding quantization bit width of each network layer and can shorten the time consumption of quantization.

Further, the sequentially performing quantization sensitivity analysis on each network layer based on the network layer parameter variance and a predefined quantization bit width set, and determining a quantization bit width corresponding to each network layer from the predefined quantization bit width set, so as to obtain a quantization bit width set corresponding to each network layer as a mixed quantization bit width, includes:

s1, sorting the network layer parameter variances according to the sequence from big to small to obtain variance sequences, and taking the corresponding network layer as the network layer to be quantized according to the variance sequences.

In the step, the network layer parameter variances are sequenced according to the numerical sequence from large to small, so that a variance sequence is obtained, the sequence number corresponding to each network layer parameter variance is the sequence number of the network layer for carrying out quantitative sensitivity analysis, and the corresponding network layer is sequentially used as the network layer to be quantified based on the sequence number of the network layer for carrying out quantitative sensitivity analysis.

And S2, quantizing the network layer to be quantized by using the first quantization bit width and the second quantization bit width respectively, and correspondingly obtaining a first quantized target detection model and a second quantized target detection model.

In this step, the network layer to be quantized is quantized by using the first quantization bit width and the second quantization bit width, and other network layers remain unchanged, so as to obtain a first quantized target detection model corresponding to the first quantization bit width and a second quantized target detection model corresponding to the second quantization bit width.

And S3, respectively evaluating the performances of the first quantized target detection model and the second quantized target detection model, and correspondingly obtaining the performances of the first model and the second model.

In the present embodiment, the performance evaluation is to evaluate the accuracy ACC of the quantized target detection model. In other embodiments of the present invention, the performance evaluation may also be to evaluate the performance of other models, such as false detection rate false positive, precision, recall, and the like, of the model, which is not limited in the present invention.

S4, according to the difference between the first model performance and the second model performance, determining the quantization bit width corresponding to the network layer to be quantized.

In this step, the difference between the first model performance and the second model performance is compared with a preset performance difference threshold, and if the difference does not exceed the preset performance difference threshold, it indicates that the difference does not greatly affect the model performance regardless of the first quantization bit width or the second quantization bit width, and the first quantization bit width may be selected as the quantization bit width of the network layer to be quantized, which does not affect the model performance and can quantize to the maximum extent. If the difference value exceeds the preset performance difference threshold value, the influence of the first quantization bit width and the second quantization bit width on the performance of the model is large, and the second quantization bit width needs to be selected as the quantization bit width of the network layer to be quantized, so that the performance of the model is guaranteed not to be reduced beyond the preset range.

In this step, the quantization bit width of each network layer is determined according to the sequence number of the network layer for performing the quantization sensitivity analysis, and then the quantization bit width set of the whole target detection model is obtained as the mixed quantization bit width.

The target detection method provided by the embodiment of the invention can automatically determine the corresponding quantization bit width of each network layer and can shorten the time consumed by quantization.

Further, the step S4, determining a quantization bit width corresponding to the network layer to be quantized according to the difference between the first model performance and the second model performance, includes:

Specifically, assuming that the preset performance difference threshold is 5%, if the difference between the first model performance and the second model performance exceeds the preset performance difference threshold, it indicates that the difference between the model performance quantized by the second quantization bit width and the model performance quantized by the first quantization bit width is large, and if the first quantization bit width quantization is adopted, the model performance is seriously degraded, so that in order to ensure that the model performance is within an ideal range, the quantization bit width of the network layer to be quantized is determined as the second quantization bit width quantization. If the output value does not exceed the preset performance difference threshold, the influence of the second quantization bit width and the first quantization bit width on the model performance is similar, and at the moment, the first quantization bit width can be used as the quantization bit width of the network layer to be quantized, so that the model performance is ensured, and the target detection model can be quantized to the maximum extent.

The target detection method provided by the embodiment of the invention can automatically determine the corresponding quantization bit width of each network layer, shorten the time consumed by quantization, quantize to the maximum limit on the basis of ensuring the model performance, and realize model compression.

In addition, in order to further save time and cost for quantization, the present invention provides another embodiment, where the performing quantization sensitivity analysis on each network layer in sequence based on the network layer parameter variance and a predefined quantization bit width set, and determining a quantization bit width corresponding to each network layer from the predefined quantization bit width set, so as to obtain a quantization bit width set corresponding to each network layer as a mixed quantization bit width includes:

s10, sorting the network layer parameter variances according to the sequence from big to small to obtain variance sequences, and taking the corresponding network layer as the network layer to be quantized according to the variance sequences.

In this step, assuming that n network layer network layers are provided, n network layer parameter variances are obtained through corresponding calculation, the network layer parameter variances are sorted in descending order to obtain a variance sequence, a network layer corresponding to a first network layer parameter variance in the variance sequence is used as a network layer to be quantized in a first round of quantization bit width determination process, and so on, a network layer corresponding to a second network layer parameter variance in the variance sequence is used as a network layer to be quantized in a second round of quantization bit width determination process, and a network layer corresponding to an ith network layer parameter variance in the variance sequence is used as a network layer to be quantized in an ith round of quantization bit width determination process.

S20, quantizing the network layer to be quantized by using the first quantization bit width and the second quantization bit width respectively, and correspondingly obtaining a first quantized target detection model and a second quantized target detection model.

And S30, respectively evaluating the performances of the first quantized target detection model and the second quantized target detection model, and correspondingly obtaining the performances of the first model and the second model.

and under the condition that the difference value exceeds a preset performance difference threshold value, determining the quantization bit width corresponding to the network layer to be quantized as a second quantization bit width, determining a new network layer to be quantized according to the variance sequence, and repeating the steps from S20 to S40.

Specifically, assuming that the preset performance difference threshold is 5%, determining the quantization bit width of the network layer to be quantized as a second quantization bit width under the condition that the difference value exceeds the preset performance difference threshold, taking the network layer corresponding to the i +1 th network layer parameter variance in the variance sequence as a new network layer to be quantized, and repeating S20 to S40 to determine the quantization bit width.

And under the condition that the difference value does not exceed a preset performance difference threshold value, determining that the quantization bit width corresponding to the network layer to be quantized is a first quantization bit width, and setting the quantization bit widths of the network layers corresponding to the network layer parameter variance with the sequence number of i +1 to the network layer corresponding to the network layer parameter variance with the sequence number of n as the first quantization bit width according to the sequence number i of the network layer parameter variance corresponding to the network layer to be quantized in the variance sequence, so as to obtain the mixed quantization bit width of the whole target detection model.

According to the target detection method provided by the embodiment of the invention, the quantization bit width of each network layer is rapidly determined through comparison of the variance sequence and the performance difference threshold, so that a mixed quantization format under the target is generated, the time consumption of quantization is shortened, the maximum limit quantization is realized on the basis of ensuring the model performance, and the model compression is realized.

Further, the quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model includes:

In this embodiment, the quantization mode of the target detection model is quantization perception training, that is, in the quantization process, the network is trained, so that the network parameters can better adapt to information loss caused by quantization. The method can be usually realized by the conventional model quantitative perception training methods such as PACT, Dorefa and LSQ, and the method is not limited by the invention.

According to the target detection method provided by the embodiment of the invention, the quantization of the target detection model is realized by using the quantization perception training, and a quantized model with higher accuracy can be obtained.

The object detection device provided by the present invention is described below, and the object detection device described below and the object detection method described above may be referred to in correspondence with each other.

Fig. 2 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present invention, and as shown in fig. 2, the target detection apparatus includes:

a mixed quantization determining module 201, configured to perform quantization sensitivity analysis on each network layer in the target detection model, and determine, based on a quantization sensitivity analysis result of each network layer, a quantization bit width corresponding to each network layer as a mixed quantization bit width.

In the module, the quantization sensitivity analysis is carried out on each network layer in the target detection model, so that the quantization bit width of each network layer is determined, and then the mixed quantization bit width consisting of the quantization bit widths corresponding to each network layer is obtained.

In this embodiment, the setting of the quantization bit width includes INT16 quantization and INT8 quantization, INT16 quantization and INT8 quantization are performed on the first network layer in the target detection model respectively, performance evaluation is performed on the target detection model obtained after two quantization bit widths are obtained respectively to obtain a performance evaluation value, the two performance evaluation values are compared, whether performance degradation of the first network layer after INT16 quantization meets a predetermined value or only INT8 quantization is met is determined, and thus the quantization bit width corresponding to the first network layer is determined. And analogizing in turn, determining the quantization bit width of each network layer in the target detection model, and thus obtaining a quantization bit width group as a mixed quantization bit width.

A model quantization module 202, configured to quantize the target detection model based on the mixed quantization bit width to obtain a quantized target detection model.

In the module, assuming that the mixed quantization bit width is first network layer INT16 quantization, second network layer INT16 quantization, third network layer INT16 quantization, and nth network layer INT8 quantization, the parameter of float32 type in the first network layer is directly quantized into 16-bit type parameter, and so on, the parameter of each network layer in the whole target detection model is quantized into corresponding 16-bit or 8-bit by float32, so that the quantized target detection model is obtained, the quantized target detection model realizes maximum limited compression, and simultaneously, the model performance can be ensured, thereby being conveniently applied to equipment with limited computing capability, such as a smart phone or an embedded device.

And the target detection module 203 is configured to obtain a picture to be detected, and input the quantized target detection model to obtain a corresponding target detection result.

The target detection device provided by the embodiment of the invention determines the quantization bit width corresponding to the network layer by performing quantization sensitivity analysis on each network layer in the target detection model, so that the finally quantized target detection model has ideal performance and can realize maximum model compression, the time consumption of the whole quantization process is short, the target detection model can be smoothly deployed to a mobile terminal or embedded equipment, and the equipment performance and the model performance can be balanced.

Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call the logic instructions in the memory 330 to execute the above object detection method, and the logic instructions in the memory 330 may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing an object detection method, the object detection method comprising:

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above object detection method.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of object detection, comprising:

2. The target detection method of claim 1, wherein the performing quantization sensitivity analysis on each network layer in the target detection model, and determining a quantization bit width corresponding to each network layer as a mixed quantization bit width based on a quantization sensitivity analysis result of each network layer comprises:

3. The method for detecting the target according to claim 2, wherein the obtaining of the network parameters of each network layer in the target detection model and analyzing the dispersion degree of the network parameters to obtain a dispersion analysis result includes:

4. The object detection method according to claim 3, wherein the performing quantization sensitivity analysis on each network layer in turn based on the network layer parameter variance and a predefined quantization bit width set, and determining a quantization bit width corresponding to each network layer from the predefined quantization bit width set, thereby obtaining the quantization bit width set corresponding to each network layer as a mixed quantization bit width comprises:

5. The target detection method of claim 4, wherein the step S4, determining the quantization bit width corresponding to the network layer to be quantized according to the difference between the first model performance and the second model performance, comprises:

6. The object detection method according to claim 3, wherein the performing quantization sensitivity analysis on each network layer in sequence based on the network layer parameter variance and a predefined quantization bit width set, and determining a quantization bit width corresponding to each network layer from the predefined quantization bit width set to obtain a quantization bit width set corresponding to each network layer as a mixed quantization bit width comprises:

7. The target detection method according to any one of claims 1 to 6, wherein the quantizing the target detection model based on the mixed quantization bit width to obtain a quantized target detection model comprises:

8. An object detection device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the object detection method according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the object detection method according to any one of claims 1 to 7.

11. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the object detection method of any one of claims 1 to 7.