CN115018070A - Neural network quantification method, target detection method and device - Google Patents

Neural network quantification method, target detection method and device Download PDF

Info

Publication number
CN115018070A
CN115018070A CN202210603119.2A CN202210603119A CN115018070A CN 115018070 A CN115018070 A CN 115018070A CN 202210603119 A CN202210603119 A CN 202210603119A CN 115018070 A CN115018070 A CN 115018070A
Authority
CN
China
Prior art keywords
network layer
quantized
output data
target
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210603119.2A
Other languages
Chinese (zh)
Inventor
黄洋逸
杨国润
卢乐炜
王哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Original Assignee
Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Lingang Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority to CN202210603119.2A priority Critical patent/CN115018070A/en
Publication of CN115018070A publication Critical patent/CN115018070A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a neural network quantization method, a target detection method and an apparatus, wherein the neural network quantization method includes: acquiring a neural network to be quantized and training data, and quantizing a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network; aiming at a first target network layer to be adjusted in the initial quantization network, processing training data to obtain intermediate output data based on a second target network layer which is adjusted before the first target network layer, and processing the intermediate output data based on the first target network layer to obtain first output data; processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized; and adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.

Description

Quantification method of neural network, target detection method and device
Technical Field
The present disclosure relates to the technical field of neural networks, and in particular, to a quantization method, a target detection method, and an apparatus for a neural network.
Background
The deep learning is a calculation-intensive algorithm, and with the diversity and complexity of processing tasks becoming higher and higher, the requirements on the accuracy and the real-time performance of the algorithm are becoming higher and higher, so that the scale of the neural network becomes larger and larger, more calculation and storage resource overhead is required, and pressure is brought to the deployment of the neural network.
In the related art, the neural network is often compressed by quantizing the neural network, however, after the neural network is quantized, the network accuracy of the neural network is greatly reduced, and therefore, how to recover the problem of the reduction in the network accuracy caused by the quantization is one of the problems to be solved in the field.
Disclosure of Invention
The embodiment of the disclosure at least provides a quantization method of a neural network, a target detection method and a target detection device.
In a first aspect, an embodiment of the present disclosure provides a quantization method for a neural network, including:
acquiring a neural network to be quantized and training data, and quantizing a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network;
processing the training data to obtain intermediate output data based on a first target network layer to be adjusted in the initial quantization network and an adjusted second target network layer before the first target network layer, and processing the intermediate output data based on the first target network layer to obtain first output data;
processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized;
and adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
In this way, after the quantization processing is performed on the neural network to be quantized to obtain an initial quantization network, aiming at a first target network layer in the initial quantization network, based on an adjusted second target network layer before the first target network layer, the training data is processed to obtain intermediate output data, and the first output data of the first target network layer is determined based on the intermediate output data; then processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized; and adjusting the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized. Therefore, the target network layer can be adjusted under the condition that the marking data corresponding to the training data are not used, so that the labor cost required by training is saved; in addition, in the process of adjusting the first target network layer, the adjusted output data of the second target network layer is combined, so that the error caused by the adjustment of the second target network layer can be considered when the first target network layer is adjusted, the error accumulation is reduced, and the network precision of the quantized target neural network is improved.
In a possible embodiment, before performing quantization processing on the to-be-quantized network layer of the to-be-quantized neural network, the method further includes:
and performing preliminary range adjustment on network parameters of a to-be-quantized network layer of the to-be-quantized neural network based on boundary parameters, wherein the boundary parameters are training parameters in an initial training process of the to-be-quantized neural network.
In this way, by using the boundary parameter to perform preliminary range adjustment on the network layer to be quantized before quantization processing, the adjusted network parameter and the variation of each layer output during quantization can be more stable, thereby reducing potential quantization loss.
In a possible implementation manner, the quantizing the to-be-quantized network layer of the to-be-quantized neural network includes:
and at least one of the weight value, the bias value and the activation value of the network layer to be quantized is quantized.
In a possible implementation manner, the quantizing the to-be-quantized network layer of the to-be-quantized neural network to obtain an initial quantization network includes:
for any network layer to be quantized, quantizing the network layer to be quantized based on a plurality of preset quantization parameters corresponding to the network layer to be quantized respectively to obtain a plurality of network layers to be screened after quantization processing corresponding to the network layer to be quantized;
and respectively determining the quantization loss corresponding to each network layer to be screened, and determining a quantized target network layer corresponding to the network layer to be quantized from the plurality of network layers to be screened on the basis of the quantization loss, wherein the quantized target network layer corresponding to each network layer to be quantized forms the initial quantization network.
Therefore, the quantization processing is carried out on the same network layer to be quantized by using a plurality of quantization parameters, and the network layer to be screened obtained after the quantization processing is screened on the basis of the quantization loss, so that the quantization loss of each target network layer in the finally obtained initial quantization network is smaller, and the quantization effect of the neural network to be quantized in the quantization processing stage is improved.
In a possible implementation, the quantization loss corresponding to the network layer to be screened is determined according to the following method:
determining the quantization loss based on the output data of the network layer to be screened and the output data of the network layer to be quantized corresponding to the network layer to be screened;
the output data of the network layer to be screened is determined based on the output data of the target network layer which is subjected to quantization processing before the network layer to be screened.
Therefore, when the quantization loss is calculated, the output data of the network layer to be screened is determined based on the output data of the target network layer which is subjected to quantization processing and is positioned before the network layer to be screened, so that the quantization loss of the network layer to be screened, which is determined at this time, can be accumulated in the quantization process before the network layer to be screened, and the quantization loss can be positioned at the position where the quantization loss is large in the quantization process based on the quantization loss, so that the quantization processing can be adjusted in time to obtain a better quantization effect.
In a possible embodiment, the adjusting the network parameter of the first target network layer based on the first output data and the second output data includes:
adjusting a bias value of the first target network layer based on the first output data and the second output data; and/or adjusting the weight value of the first target network layer based on the first output data, the second output data and a rounding mask parameter to be trained, wherein the rounding mask parameter is used for rounding the weight value of the first target network layer.
Therefore, the first target network layer can be adjusted without introducing additional labeling information, and the labor cost required for adjusting the network parameters (namely training) is saved.
In one possible embodiment, the adjusting the bias value of the first target network layer based on the first output data and the second output data includes:
determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data;
adjusting a bias value of the first target network layer based on the bias adjustment value;
determining a first loss value of the first target network layer, and updating the first output data based on the first target network layer after the offset value is adjusted under the condition that the first loss value does not meet a first preset condition;
and returning to the step of determining the offset adjustment value based on the updated first output data and the second output data until the first loss value meets the first preset condition.
In one possible embodiment, the training data includes a sample image, the first output data includes a first feature map, and the second output data includes a second feature map;
the determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data includes:
taking the difference between the first mean value of each channel in the second feature map and the second mean value of each channel in the first feature map as the bias adjustment value; alternatively, the first and second electrodes may be,
and determining the bias adjustment value based on the value of each channel of the first feature map and the second feature map after activation processing and a mask, wherein the mask is used for screening the channel value of the first feature map and the second feature map when the bias adjustment value is calculated, and the mask is determined based on the boundary parameter of the first target network layer when preliminary range adjustment is performed.
In this way, by using the mask to screen the channel values participating in the calculation of the first mean value and the second mean value, the channel adjusted in the preliminary range adjustment stage can be excluded from the calculation process, so that the influence of the preliminary range adjustment operation on the determination of the offset adjustment value can be avoided, and the accuracy of the offset adjustment value can be ensured.
In a possible implementation, the adjusting the weight value of the first target network layer based on the first output data, the second output data and rounding mask parameters to be trained includes:
rounding up each weight value in the first target network layer based on the quantization parameter corresponding to the first target network layer, and determining a target weight value corresponding to each weight value;
updating target weight values corresponding to the weight values respectively based on the rounding mask parameters, and updating the first output data based on the updated weight values;
and determining a second loss value based on the updated first output data and the second output data, and returning to the step of determining the target weight value under the condition that the second loss value does not meet a second preset condition until the second loss value meets the second preset condition.
Therefore, the weight value of the first target network layer is adjusted by introducing the rounding mask parameter, and the network precision of the finally obtained target neural network can be improved.
In a possible implementation manner, after obtaining a quantized target neural network corresponding to the network to be quantized, the method further includes:
and training the target neural network based on the training data and the labeled data of the training data.
Therefore, the network precision of the target neural network can be further improved by training the target neural network by adopting the training data with the labeled data.
In a second aspect, an embodiment of the present disclosure further provides a target detection method, including:
acquiring point cloud data to be detected;
detecting the point cloud data to be detected based on the target neural network obtained by quantization according to the quantization method of the neural network in the first aspect, and determining the detection result of the point cloud data to be detected;
and controlling the target vehicle to run based on the detection result.
In a third aspect, an embodiment of the present disclosure further provides a quantization apparatus for a neural network, including:
the quantization module is used for acquiring a neural network to be quantized and training data, and performing quantization processing on a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network;
a determining module, configured to, for a first target network layer to be adjusted in the initial quantization network, process the training data to obtain intermediate output data based on an adjusted second target network layer before the first target network layer, and process the intermediate output data based on the first target network layer to obtain first output data;
the processing module is used for processing the training data to obtain second output data based on network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized;
and the adjusting module is used for adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
In a possible embodiment, before performing quantization processing on the to-be-quantized network layer of the to-be-quantized neural network, the quantization module is further configured to:
and performing preliminary range adjustment on network parameters of a to-be-quantized network layer of the to-be-quantized neural network based on boundary parameters, wherein the boundary parameters are training parameters in an initial training process of the to-be-quantized neural network.
In one possible embodiment, the quantization module, when performing quantization processing on a to-be-quantized network layer of the to-be-quantized neural network, is configured to:
and at least one of the weight value, the bias value and the activation value of the network layer to be quantized is quantized.
In a possible implementation manner, when the quantization module performs quantization processing on the to-be-quantized network layer of the to-be-quantized neural network to obtain an initial quantization network, the quantization module is configured to:
for any network layer to be quantized, quantizing the network layer to be quantized based on a plurality of preset quantization parameters corresponding to the network layer to be quantized respectively to obtain a plurality of network layers to be screened after quantization processing corresponding to the network layer to be quantized;
and respectively determining the quantization loss corresponding to each network layer to be screened, and determining a quantized target network layer corresponding to the network layer to be quantized from the plurality of network layers to be screened based on the quantization loss, wherein the quantized target network layer corresponding to each network layer to be quantized forms the initial quantization network.
In a possible implementation manner, the quantization module is configured to determine a quantization loss corresponding to the network layer to be screened according to the following method:
determining the quantization loss based on the output data of the network layer to be screened and the output data of the network layer to be quantized corresponding to the network layer to be screened;
the output data of the network layer to be screened is determined based on the output data of the target network layer which is subjected to quantization processing before the network layer to be screened.
In a possible implementation, the adjusting module, when adjusting the network parameter of the first target network layer based on the first output data and the second output data, is configured to:
adjusting a bias value of the first target network layer based on the first output data and the second output data; and/or adjusting the weight value of the first target network layer based on the first output data, the second output data and a rounding mask parameter to be trained, wherein the rounding mask parameter is used for rounding the weight value of the first target network layer.
In a possible implementation, the adjusting module, when adjusting the bias value of the first target network layer based on the first output data and the second output data, is configured to:
determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data;
adjusting a bias value of the first target network layer based on the bias adjustment value;
determining a first loss value of the first target network layer, and updating the first output data based on the first target network layer after the offset value is adjusted under the condition that the first loss value does not meet a first preset condition;
and returning to the step of determining the offset adjustment value based on the updated first output data and the second output data until the first loss value meets the first preset condition.
In one possible embodiment, the training data includes a sample image, the first output data includes a first feature map, and the second output data includes a second feature map;
the adjusting module, when determining the bias adjustment value corresponding to the first target network layer based on the first output data and the second output data, is configured to:
taking the difference between the first mean value of each channel in the second feature map and the second mean value of each channel in the first feature map as the bias adjustment value; alternatively, the first and second electrodes may be,
and determining the bias adjustment value based on the value of each channel of the first feature map and the second feature map after activation processing and a mask, wherein the mask is used for screening the channel values of the first feature map and the second feature map when the bias adjustment value is calculated, and the mask is determined based on the boundary parameter of the first target network layer when preliminary range adjustment is performed.
In a possible implementation, the adjusting module, when adjusting the weight value of the first target network layer based on the first output data, the second output data and rounding mask parameters to be trained, is configured to:
rounding up each weight value in the first target network layer based on the quantization parameter corresponding to the first target network layer, and determining a target weight value corresponding to each weight value;
updating target weight values corresponding to the weight values respectively based on the rounding mask parameters, and updating the first output data based on the updated weight values;
and determining a second loss value based on the updated first output data and the second output data, and returning to the step of determining the target weight value under the condition that the second loss value does not meet a second preset condition until the second loss value meets the second preset condition.
In a possible implementation manner, after obtaining a quantized target neural network corresponding to the network to be quantized, the adjusting module is further configured to:
and training the target neural network based on the training data and the labeled data of the training data.
In a fourth aspect, an embodiment of the present disclosure further provides an object detection apparatus, including:
the acquisition module is used for acquiring point cloud data to be detected;
a detection module, configured to detect the point cloud data to be detected based on a target neural network obtained by quantization according to the quantization method for a neural network in any one of the first aspect, and determine a detection result of the point cloud data to be detected;
and the control module is used for controlling the target vehicle to run based on the detection result.
In a fifth aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of any one of the possible implementations of the first or second aspect.
In a sixth aspect, the disclosed embodiments further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps in any one of the possible implementation manners of the first aspect or the second aspect.
For the description of the effects of the quantization apparatus, the computer device, and the computer-readable storage medium of the neural network, reference is made to the description of the quantization method of the neural network, and details are not repeated here.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 illustrates a flow chart of a quantization method of a neural network provided by an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a specific method for obtaining an initial quantization network in a quantization method of a neural network provided in an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating a specific method for adjusting the bias value in the quantization method of the neural network provided by the embodiment of the present disclosure;
fig. 4 is a flowchart illustrating a specific method for adjusting weight values in a quantization method of a neural network provided in an embodiment of the present disclosure;
FIG. 5 is a flow chart illustrating a method for object detection provided by an embodiment of the present disclosure;
FIG. 6 is a flow chart illustrating another method for quantifying neural networks provided by embodiments of the present disclosure;
fig. 7 is a schematic diagram illustrating adjustment of model parameters in a quantization method of a neural network provided by an embodiment of the present disclosure;
fig. 8 is a schematic diagram illustrating an architecture of a quantization apparatus of a neural network provided in an embodiment of the present disclosure;
fig. 9 is a schematic diagram illustrating an architecture of an object detection apparatus provided in an embodiment of the present disclosure;
fig. 10 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
It is found through research that, in the related art, the neural network is often compressed by quantizing the neural network, however, after the quantization processing is performed on the neural network, the network accuracy of the neural network is greatly reduced, and therefore, how to recover the problem of the reduction in the network accuracy caused by the quantization processing becomes one of the problems to be solved in the field.
Based on the research, the present disclosure provides a quantization method of a neural network, a target detection method and an apparatus, after a neural network to be quantized is subjected to quantization processing to obtain an initial quantization network, for a first target network layer in the initial quantization network, based on an adjusted second target network layer before the first target network layer, training data is processed to obtain intermediate output data, and based on the intermediate output data, first output data of the first target network layer is determined; then processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized; and adjusting the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized. Therefore, the target network layer can be adjusted under the condition that the marking data corresponding to the training data are not used, so that the labor cost required by training is saved; in addition, in the process of adjusting the first target network layer, the adjusted output data of the second target network layer is combined, so that the error caused by the adjustment of the second target network layer can be considered when the first target network layer is adjusted, the error accumulation is reduced, and the network precision of the quantized target neural network is improved.
To facilitate understanding of the present embodiment, first, a detailed description is given of a quantization method of a neural network disclosed in an embodiment of the present disclosure, and an execution subject of the quantization method of the neural network provided in the embodiment of the present disclosure is generally a computer device with certain computing power, where the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a terminal, or other processing devices. In some possible implementations, the quantization method of the neural network may be implemented by a processor calling computer-readable instructions stored in a memory.
Referring to fig. 1, a flowchart of a quantization method of a neural network provided in an embodiment of the present disclosure is shown, where the method includes S101 to S104, where:
s101: obtaining a neural network to be quantized and training data, and quantizing a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network.
S102: and processing the training data to obtain intermediate output data based on a first target network layer to be adjusted in the initial quantization network and a second target network layer which is adjusted before the first target network layer, and processing the intermediate output data to obtain first output data based on the first target network layer.
S103: and processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized.
S104: and adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
The following is a detailed description of the above steps.
For S101, when performing quantization processing on a to-be-quantized network layer of the to-be-quantized neural network, the to-be-quantized network layer may be quantized according to a preset quantization parameter, where the quantization parameter is used to represent a quantization range of each to-be-quantized network layer of the to-be-quantized neural network, and may include a range in which a floating point value is quantized into fixed-point and fixed-point values, and a quantization step size when performing quantization processing, where the quantization step size may be, for example, 8bit, 16bit, or the like; the network layer to be quantized may be a network layer of the whole neural network to be quantized, or may also be a network layer of a part of the neural network to be quantized, such as a convolutional layer in the neural network to be quantized.
Exemplarily, taking an application scenario as an automatic driving as an example, the neural network to be quantized may be a point cloud target detection network (the network type may be a SECOND model, a pointpilars model, a PointRCNN model, a PV-RCNN model, etc.), and is configured to perform target detection on laser radar point cloud data acquired by a laser radar deployed on an automatic driving device; the training data may be sample point cloud data collected in an autonomous driving scenario.
Specifically, when the network layer to be quantized of the neural network to be quantized is quantized, at least one of a weight value, a bias value, and an activation value of the network layer to be quantized may be quantized.
Illustratively, for a given floating-point tensor X (which may be at least one of a weight value, a bias value, and an activation value) to be quantized, the quantized fixed-point number tensor Xq may be represented by:
Figure BDA0003670418700000091
the function clamp (X, min, max) represents that the values of the elements in the tensor are truncated within a certain preset closed interval range, that is, a value smaller than the lower bound of the preset interval is truncated to the lower bound, and a value larger than the upper bound of the preset interval is truncated to the upper bound.
In one possible implementation, the neural network to be quantified is a point cloud target detection network, and the point cloud target detection network is composed of a plurality of network structures such as a point cloud feature extraction network, a feature fusion network and a prediction head network.
Here, the point cloud feature extraction network is configured to perform feature extraction on input point cloud data to obtain a feature vector (or tensor) corresponding to the point cloud data; the feature fusion network is used for performing feature fusion on the features of the point cloud data extracted by the point cloud feature extraction network, and comprises tensor splicing operation on the input tensor and the like; the prediction head network is used for predicting the characteristics of the fused point cloud data, the prediction head network comprises an activation layer, and the activation layer can be located at the tail end of the prediction head network and is used for activating input data input into the activation layer so as to improve the expression capability of the prediction head network.
Specifically, when the network layer to be quantized is quantized, the same quantization parameters can be used for quantizing network parameters corresponding to tensors requiring tensor splicing operation, so as to reduce quantization loss caused by subsequent tensor splicing operation; in addition, a quantization range can be set for an activation layer at the tail end of the prediction head network according to an experimental result so as to improve the expression capability of the point cloud feature extraction network.
In a possible implementation manner, before performing quantization processing on the to-be-quantized network layer of the to-be-quantized neural network, preliminary range adjustment may be performed on network parameters of the to-be-quantized network layer of the to-be-quantized neural network based on boundary parameters.
Here, the boundary parameter is a training parameter in an initial training process of the neural network to be quantized, and different boundary parameters may be used to perform preliminary range adjustment for different quantization objects in the network layer to be quantized.
Specifically, the initial training of the neural network to be quantized is performed before the quantization processing, so as to improve the network accuracy of the neural network to be quantized, in the initial training process of the neural network to be quantized, the boundary parameter is added to the loss function in the initial training process, so as to determine the loss value of the neural network to be quantized in the initial training process based on the boundary parameter, and perform preliminary range adjustment (i.e., adjustment network parameter in the training process) on the network layer to be quantized based on the loss value, where the boundary parameter may be used to determine an adjustment range when performing parameter adjustment in the preliminary range adjustment process.
In one possible implementation manner, a first boundary parameter may be set for output data of an active layer in a network layer to be quantized, and when performing preliminary range adjustment, a first numerical range may be determined based on the first boundary parameter when performing preliminary range adjustment, and data out of the first numerical range in the output data of the active layer may be adjusted to a boundary value corresponding to the first numerical range.
Here, the first boundary parameter may be a learnable parameter that is updated in real time when the initial range adjustment is performed, and the first boundary parameter may be trained using a weight decay mechanism (i.e., the first boundary parameter is updated).
For example, taking a first value range determined based on the first boundary parameter as-10 to 10 and data included in the output data as 8, 9, and 11 as an example, since 11 is located outside the first value range, the data may be adjusted to a corresponding boundary value 10, so as to obtain output data of the active layer after the preliminary range adjustment as 8, 9, and 10.
For example, taking the adjustment of the output data of the ReLU active layer as an example, the output of the ReLU active layer after the preliminary range adjustment may be represented as:
the ClipReLU (X) is a first value range calculated based on the learnable parameter α, where α' is an upper bound of the first value range, and when the upper bound of the first value range is calculated, the specific calculation formula may be set according to an actual use situation, which is not limited in this embodiment of the disclosure.
In another possible implementation, a second boundary parameter may be further set for the weight value, and when performing the preliminary range adjustment, a second value range corresponding to the preliminary range adjustment of the network layer to be quantized may be determined based on the second boundary parameter and the weight value of any network layer to be quantized, and data outside the second value range in the weight value may be adjusted to a boundary value corresponding to the second value range.
Here, the second boundary parameter may be a parameter set in advance according to an experimental result.
Specifically, when the second numerical value range is determined, the second numerical value range corresponding to the preliminary range adjustment of the network layer to be quantized may be determined by using the average of the absolute values of the weight values of any network layer to be quantized and the second boundary parameter.
For example, taking the weighted values of the network layer to be quantized as 1, 2, 3, and 4 and the second boundary parameter as 1.2 as an example, it may be determined that the mean value of the absolute values of the weighted values of the network layer to be quantized is 2.5, the second numerical range may be determined as-3 to 3 according to the product of the mean value and the second boundary parameter, and the weighted value 4 may be adjusted to 3, so as to obtain the weighted values of the network layer to be quantized after the preliminary range adjustment as 1, 2, 3, and 3.
For example, taking the adjustment of the weight of the convolutional layer as an example, the weight of the convolutional layer after the preliminary range adjustment may be represented as:
W′=clamp(W,-avg(|W|)·β,avg(|W|)·β)
where β is the second boundary parameter, avg represents the averaging, and W represents the weight value of the convolutional layer before adjustment.
In this way, by using the boundary parameter to perform preliminary range adjustment on the network layer to be quantized before quantization processing, the adjusted network parameter and the variation of each layer output during quantization can be more stable, thereby reducing potential quantization loss.
In one possible implementation, as shown in fig. 2, the initial quantization network may be obtained by:
s201: and for any network layer to be quantized, quantizing the network layer to be quantized based on a plurality of preset quantization parameters corresponding to the network layer to be quantized respectively to obtain a plurality of network layers to be screened after quantization processing corresponding to the network layer to be quantized.
S202: and respectively determining the quantization loss corresponding to each network layer to be screened, and determining a quantized target network layer corresponding to the network layer to be quantized from the plurality of network layers to be screened based on the quantization loss, wherein the quantized target network layer corresponding to each network layer to be quantized forms the initial quantization network.
Here, the quantization loss may be characterized by using a similarity index between output data before and after quantization processing by the network layer, where the similarity index may be an index capable of characterizing similarity, such as an L1 distance, a cosine distance, and the like.
Therefore, the quantization processing is carried out on the same network layer to be quantized by using a plurality of quantization parameters, and the network layer to be screened obtained after the quantization processing is screened on the basis of the quantization loss, so that the quantization loss of each target network layer in the finally obtained initial quantization network is smaller, and the quantization effect of the neural network to be quantized in the quantization processing stage is improved.
In a possible implementation manner, when determining the quantization loss corresponding to the network layer to be screened, the quantization loss may be determined based on the output data of the network layer to be screened and the output data of the network layer to be quantized corresponding to the network layer to be screened.
Here, the output data of the network layer to be screened is determined based on the output data of the target network layer before the network layer to be screened after quantization processing.
Specifically, the calculation formula for determining the quantization loss may be:
Figure BDA0003670418700000121
in the above-mentioned formula,
Figure BDA0003670418700000122
representing output data of an ith network layer to be quantized, wherein the network layers to be quantized before the ith network layer to be quantized are not subjected to quantization processing;
Figure BDA0003670418700000123
representing output data of the ith network layer to be screened, wherein target network layers before the network layer to be screened are all output data of the ith network layer to be screenedAfter quantization, similarity indicates that similarity index calculation is performed.
Therefore, when the quantization loss is calculated, the output data of the network layer to be screened is determined based on the output data of the target network layer which is subjected to quantization processing and is positioned before the network layer to be screened, so that the quantization loss of the network layer to be screened, which is determined at this time, can be accumulated in the quantization loss of the quantization process before the network layer to be screened, and the quantization loss can be positioned at the position where the quantization loss is large in the quantization process based on the quantization loss, so that the quantization processing process can be adjusted in time to obtain a better quantization effect.
S102: and processing the training data to obtain intermediate output data based on a first target network layer to be adjusted in the initial quantization network and a second target network layer which is adjusted before the first target network layer, and processing the intermediate output data to obtain first output data based on the first target network layer.
Here, when the target network layers in the initial quantized network are adjusted, the adjustment may be performed from front to back according to the connection order of the target network layers in the initial quantized network, and the adjustment of second target network layers before the first target network layer is already completed before determining the output data of the first target network layer.
Specifically, the output data of the second target network layer may be input to the first target network layer to obtain first output data output by the first target network layer, where the second target network layer may include multiple layers, and the output data of the second target network layer input to the first target network layer is output data of a last layer of the second target network layer.
For example, if the first target network layer is the 6 th convolutional layer in the initial quantization network, and the second target network layers are the 1 st to 5 th convolutional layers, the output data of the second target network layer input to the first target network layer is the output data of the 5 th convolutional layer.
S103: and processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized.
Here, the second output data is output data of a to-be-quantized network layer corresponding to the first target network layer after the training data is input into the to-be-quantized neural network.
In the above example, still taking the first target network layer as the 6 th convolutional layer in the initial quantization network as an example, the second output data is the output data of the 6 th convolutional layer after the training data is input into the neural network to be quantized.
It should be noted that, the execution sequence of S102 and S103 in specific execution may be that S102 is executed first and then S103 is executed, or S103 is executed first and then S102 is executed, or S102 and S103 are executed simultaneously, which is not limited in this disclosure.
S104: and adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
Here, since no additional tag data is used when the first target network layer is adjusted, and only the first output data and the second output data are used for adjustment, the adjustment process for adjusting the first target network layer based on the first output data and the second output data is label-free tuning, and the use of the label-free tuning can save the labor cost required for network parameter adjustment (i.e., training).
Specifically, the adjustment may be performed by any one of the following methods:
mode 1, adjusting the bias value of the first target network layer
Here, the training data may include a sample image, the first output data may include a first feature map, and the second output data may include a second feature map; in adjusting the bias value of the first target network layer, the bias value of the first target network layer may be adjusted based on the first output data and the second output data.
In one possible implementation, as shown in fig. 3, the offset value may be adjusted by:
s301: and determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data.
Specifically, when determining the offset adjustment value, any one of the following methods may be used:
the method A comprises the following steps: and taking the difference between the first average value of each channel in the second output data and the second average value of each channel in the first output data as the offset adjustment value.
Here, since a characteristic map in the first target network layer output after quantization may have an offset error of the overall distribution of one channel, thereby causing a large quantization loss, for this quantization loss, a difference between a first average value of each channel in the second output data (corresponding to the output data before quantization) and a second average value of each channel in the first output data (corresponding to the output data after quantization) may be determined, and the difference may be used as the offset adjustment value.
For example, the formula for calculating the bias adjustment value may be:
Figure BDA0003670418700000131
wherein d is [i] Indicating a bias adjustment value corresponding to the ith layer and the first target network layer; avg (Y) [i] ) Representing a first mean value of each channel in the second output data;
Figure BDA0003670418700000141
representing a second average value for each channel in the first output data.
Specifically, the difference value represents an offset condition of overall channel distribution when the first target network layer performs quantization processing, and the corresponding quantization loss can be reduced by performing offset value adjustment by using the difference value as the offset adjustment value.
The method B comprises the following steps: and determining the bias adjustment value based on the value of each channel of the first output data and the second output data after activation processing and a mask.
Here, the mask is used to filter channel values of the first output data and the second output data when calculating the bias adjustment value, and the mask is determined based on a boundary parameter of the first target network layer when performing a preliminary range adjustment.
Specifically, before quantization processing, a boundary parameter may be used to determine a numerical range when preliminary range adjustment is performed, and data outside the numerical range in output data is adjusted, and the adjusted data may affect calculation of the offset adjustment value when mean value calculation is performed, so that the calculated offset adjustment value may not accurately represent a shift condition of the overall distribution of channels when quantization processing is performed on the first target network layer.
Specifically, the dimension of the first feature map may be C × H × W, where C represents the number of channels of the feature map, H represents the height of the feature map, W represents the width of the feature map, the dimension of the mask is the same as that of the first feature map, the dimension of the mask is also C × H × W, and each channel in the mask corresponds to each channel in the first feature map one to one, the channel value of each channel in the mask may be 0 or 1, and for any target channel of the mask, if the channel value in the first feature map corresponding to the target channel is not adjusted in the preliminary range adjustment stage, the channel value of the target channel is 1, which represents that the channel value in the first feature map corresponding to the target channel participates in the calculation process of the difference between the mean values in the method a; if the channel value in the first feature map corresponding to the target channel is adjusted in the preliminary range adjustment stage, the channel value of the target channel is 0, which indicates that the channel value in the first feature map corresponding to the target channel does not participate in the calculation process of the mean value difference in the method a.
For example, taking the adjustment of the ReLU active layer as an example, the formula for calculating the bias adjustment value may be:
Figure BDA0003670418700000142
wherein d is [i] Indicating a bias adjustment value corresponding to a network layer (namely, a ReLU activation layer) to be adjusted at the i-th layer;
Figure BDA0003670418700000143
representing calculating the difference value of each channel; m represents a mask corresponding to a ReLU activation layer; avg represents the averaging operation processing.
Thus, on the basis of the above method a, by multiplying the mask using 0 and 1 as channel values in the process of calculating the difference between the mean values, the channel adjusted in the preliminary range adjustment stage can be excluded from the calculation process, so that the influence of the preliminary range adjustment operation on the determination of the offset adjustment value can be avoided.
S302: adjusting a bias value of the first target network layer based on the bias adjustment value.
Here, when the offset value of the first target network layer is adjusted based on the adjustment offset value, the gradual adjustment may be performed using an adjustment manner of the hot update warp update.
For example, taking the offset value of the first target network layer as 1 and the offset adjustment value as 0.5, it may be determined that the offset value of the first target network layer is adjusted to 1.1 at this time according to an adjustment manner of the thermal update warp update, instead of being directly adjusted to 1.5, so that the problem of network accuracy reduction caused by too large single adjustment amplitude may be avoided.
For example, the formula for adjusting the offset value may be:
Figure BDA0003670418700000151
in the above formula, the sum of the original offset value and the offset adjustment value can be used as the target of the adjustment, and during the adjustment, the adjustment mode of warp update can be used.
S303: and determining a first loss value of the first target network layer, and updating the first output data based on the first target network layer after the offset value is adjusted when the first loss value does not meet a first preset condition.
S304: and returning to the step of determining the offset adjustment value based on the updated first output data and the second output data until the first loss value meets the first preset condition.
Specifically, the first preset condition may be that the number of times of execution of the returning reaches a preset number of times, the first loss value may be a similarity index between the first target network layer output data before and after the adjustment, for example, an index capable of representing the similarity, such as an L1 distance, a cosine distance, and the like, and a calculation process is similar to the process of determining the quantization loss in S202, which is not described herein again.
Mode 2, adjusting the weight value of the first target network layer
Here, the weight value of the first target network layer may be adjusted based on the first output data, the second output data, and a rounding mask parameter to be trained, where the rounding mask parameter is used for rounding the weight value of the first target network layer.
In one possible implementation, as shown in fig. 4, the weight value may be adjusted by:
s401: and rounding each weight value in the first target network layer based on the quantization parameter corresponding to the first target network layer, and determining a target weight value corresponding to each weight value.
S402: and updating the target weight values corresponding to the weight values respectively based on the rounding mask parameters, and updating the first output data based on the updated weight values.
Here, the formula used when rounding each weight value in the first target network layer based on the quantization parameter corresponding to the first target network layer may be:
Figure BDA0003670418700000152
wherein the content of the first and second substances,
Figure BDA0003670418700000153
denotes a target weight value after rounding processing, tensorScale denotes a quantization range, W [i] Representing the weight values, M, without quantization rounding And expressing the rounding mask parameters, finishing rounding processing of each weight value through the formula, and determining a target weight value corresponding to each weight value.
Specifically, the rounding mask parameter may be a learnable parameter that is updated in real time when the target weight value is adjusted, and the rounding mask parameter may be trained using a loss function that penalizes weight increase with training iteration number (that is, updates the first boundary parameter).
For example, the loss function for training the rounding mask parameters may be:
L reg =λ·Σ(1-|clamp((ζ-γ)·sigmoid(M rounding )+γ,0,1)-0.5| β )
and zeta, gamma, beta and lambda are preset parameters, and beta is gradually reduced along with the increase of the number of training iterations.
S403: and determining a second loss value based on the updated first output data and the second output data, and returning to the step of determining the target weight value under the condition that the second loss value does not meet a second preset condition until the second loss value meets the second preset condition.
Here, the second loss function used when determining the second loss value may be the same as the first loss function used when determining the first loss value; or, adding a loss function corresponding to the rounding mask parameter on the basis of the first loss function, so that the obtained second loss function can better meet the real-time change condition of the rounding mask parameter, and the second preset condition can be that the number of times of execution is returned to reach a preset number of times, and the like.
Thus, adjusting the bias value of the first target network layer based on the first output data and the second output data; and/or adjusting the weight value of the first target network layer based on the first output data, the second output data and the rounding mask parameter to be trained, so that the first target network layer can be adjusted without introducing additional labeling information, and the labor cost required for network parameter adjustment (namely training) is saved.
In a possible embodiment, when the first target network layer is adjusted based on the first output data and the second output data, the first target network layer may be adjusted based on the above-mentioned manner 1 and manner 2, respectively, and input data input to a next target network layer is re-determined based on updated network parameters of the first target network layer, and the above-mentioned adjusting and updating steps are repeatedly performed to complete updating of all target network layers in the initial quantized network, so that adjustment of the initial quantized network can be simultaneously and effectively combined with multiple adjusting manners, and the adjusting effect of the initial quantized network is better.
In practical application, after the first target network layer is adjusted through the above label-free tuning process to obtain the target neural network, because no label information is used in the adjustment process, a problem that the network precision cannot meet a preset network precision requirement may occur.
In a possible implementation manner, after obtaining a quantized target neural network corresponding to the network to be quantized, the target neural network may be trained based on the training data and the labeled data of the training data.
Specifically, the target neural network may be trained in a preset gradient estimation manner based on the training data and the labeled data of the training data, so as to further improve the network accuracy of the target network.
Referring to fig. 5, which is a flowchart of a target detection method provided in the embodiment of the present disclosure, the method includes S501 to S503, where:
s501: and acquiring point cloud data to be detected.
S502: the point cloud data to be detected is detected based on the target neural network obtained by the quantization method of the neural network according to any embodiment of the disclosure, and the detection result of the point cloud data to be detected is determined.
S503: and controlling the target vehicle to run based on the detection result.
The control target vehicle travels, for example, including acceleration, deceleration, steering, braking, etc., of the control target vehicle, or voice prompt information may be played to prompt the driver to control the acceleration, deceleration, steering, braking, etc., of the control target vehicle.
In the following, a quantization method of a neural network provided by an embodiment of the present disclosure will be described with reference to a specific example, as shown in fig. 6, the quantization method of the neural network provided by an embodiment of the present disclosure may include the following steps:
step 1, acquiring a data set containing correction data
Here, the correction data is training data for adjusting network parameters of the neural network.
Step 2, carrying out stable quantification training on the neural network to be quantified to obtain a stable quantified floating point model
Here, the stable quantization training is used to perform a preliminary range adjustment on the network parameters of the to-be-quantized network layer of the to-be-quantized neural network.
Specifically, for the specific description of the stable quantization training, reference may be made to the related content of performing the preliminary range adjustment on the network parameter of the to-be-quantized network layer of the to-be-quantized neural network based on the boundary parameter in S101, which is not described herein again.
Step 3, carrying out quantization processing on the floating point model according to preset quantization parameters to obtain a preliminary quantization model
Specifically, the specific description of the quantization processing performed on the floating-point model may refer to the related contents in S101, and is not repeated here.
And 4, adjusting model parameters of the preliminary quantization model according to the preliminary quantization model and the floating point model to obtain an optimized quantization model.
Here, after the model parameters of the initial quantization model are adjusted, the quantization model may be further adjusted and optimized by using a preset gradient estimation method, so as to further improve the network accuracy of the adjusted and optimized quantization model.
In a possible implementation manner, when the model parameters of the preliminary quantization model are adjusted according to the preliminary quantization model and the floating point model, the model parameters may be adjusted by using the steps in fig. 7 as shown in fig. 7.
Here, when the model parameters of the preliminary quantization model are adjusted, the model parameters may be adjusted in sequence from layer 1 to layer n according to a forward propagation topological order, and when the model parameters are adjusted to layer i, bias value correction processing and adaptive weight rounding processing may be performed on the network parameters of layer i according to a quantized feature map (that is, output data of layer i in a quantized network) and a floating point feature map before quantization (that is, output data of layer i in a network before quantization), and the quantization model may be updated to synchronize the adjustment to the quantization model, thereby completing the adjustment of the parameters of layer i.
Specifically, the relevant description for performing the offset value correction processing may refer to the relevant contents of S301 and S302, and the relevant description for performing the adaptive weight rounding processing may refer to the relevant contents of S401 and S402, and will not be described again here.
According to the quantization method of the neural network, after quantization processing is performed on a neural network to be quantized to obtain an initial quantization network, for a first target network layer in the initial quantization network, training data are processed to obtain intermediate output data based on an adjusted second target network layer before the first target network layer, and first output data of the first target network layer are determined based on the intermediate output data; then processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized; and adjusting the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized. Therefore, the target network layer can be adjusted under the condition that the marking data corresponding to the training data are not used, so that the labor cost required by training is saved; in addition, in the process of adjusting the first target network layer, the adjusted output data of the second target network layer is combined, so that the error caused by the adjustment of the second target network layer can be considered when the first target network layer is adjusted, the error accumulation is reduced, and the network precision of the quantized target neural network is improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a quantization apparatus of a neural network corresponding to the quantization method of the neural network, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the quantization method of the neural network described above in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 8, there is shown an architecture diagram of a quantization apparatus of a neural network according to an embodiment of the present disclosure, the apparatus includes: a quantization module 801, a determination module 802, a processing module 803, an adjustment module 804; wherein the content of the first and second substances,
a quantization module 801, configured to obtain a neural network to be quantized and training data, and perform quantization processing on a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network;
a determining module 802, configured to, for a first target network layer to be adjusted in the initial quantization network, process the training data to obtain intermediate output data based on an adjusted second target network layer before the first target network layer, and process the intermediate output data based on the first target network layer to obtain first output data;
a processing module 803, configured to process the training data to obtain second output data based on to-be-quantized network layers, which correspond to the first target network layer and the second target network layer, in the to-be-quantized neural network, respectively;
an adjusting module 804, configured to adjust a network parameter of the first target network layer based on the first output data and the second output data, so as to obtain a quantized target neural network corresponding to the network to be quantized.
In a possible implementation, before performing quantization processing on the to-be-quantized network layer of the to-be-quantized neural network, the quantization module 801 is further configured to:
and performing preliminary range adjustment on network parameters of a to-be-quantized network layer of the to-be-quantized neural network based on boundary parameters, wherein the boundary parameters are training parameters in an initial training process of the to-be-quantized neural network.
In one possible implementation, the quantization module 801, when performing quantization processing on a to-be-quantized network layer of the to-be-quantized neural network, is configured to:
and at least one of the weight value, the bias value and the activation value of the network layer to be quantized is quantized.
In a possible implementation manner, when performing quantization processing on a to-be-quantized network layer of the to-be-quantized neural network to obtain an initial quantization network, the quantization module 801 is configured to:
for any network layer to be quantized, quantizing the network layer to be quantized based on a plurality of preset quantization parameters corresponding to the network layer to be quantized respectively to obtain a plurality of network layers to be screened after quantization processing corresponding to the network layer to be quantized;
and respectively determining the quantization loss corresponding to each network layer to be screened, and determining a quantized target network layer corresponding to the network layer to be quantized from the plurality of network layers to be screened on the basis of the quantization loss, wherein the quantized target network layer corresponding to each network layer to be quantized forms the initial quantization network.
In a possible implementation manner, the quantization module 801 is configured to determine a quantization loss corresponding to the network layer to be screened according to the following method:
determining the quantization loss based on the output data of the network layer to be screened and the output data of the network layer to be quantized corresponding to the network layer to be screened;
the output data of the network layer to be screened is determined based on the output data of the target network layer which is subjected to quantization processing before the network layer to be screened.
In a possible implementation, the adjusting module 804, when adjusting the network parameter of the first target network layer based on the first output data and the second output data, is configured to:
adjusting a bias value of the first target network layer based on the first output data and the second output data; and/or adjusting the weight value of the first target network layer based on the first output data, the second output data and a rounding mask parameter to be trained, wherein the rounding mask parameter is used for rounding the weight value of the first target network layer.
In a possible implementation, the adjusting module 804, when adjusting the bias value of the first target network layer based on the first output data and the second output data, is configured to:
determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data;
adjusting a bias value of the first target network layer based on the bias adjustment value;
determining a first loss value of the first target network layer, and updating the first output data based on the first target network layer after the offset value is adjusted under the condition that the first loss value does not meet a first preset condition;
and returning to the step of determining the offset adjustment value based on the updated first output data and the second output data until the first loss value meets the first preset condition.
In one possible embodiment, the training data includes a sample image, the first output data includes a first feature map, and the second output data includes a second feature map;
the adjusting module 804, when determining the bias adjustment value corresponding to the first target network layer based on the first output data and the second output data, is configured to:
taking the difference between the first mean value of each channel in the second feature map and the second mean value of each channel in the first feature map as the bias adjustment value; alternatively, the first and second electrodes may be,
and determining the bias adjustment value based on the value of each channel of the first feature map and the second feature map after activation processing and a mask, wherein the mask is used for screening the channel values of the first feature map and the second feature map when the bias adjustment value is calculated, and the mask is determined based on the boundary parameter of the first target network layer when preliminary range adjustment is performed.
In a possible implementation, the adjusting module 804, when adjusting the weight value of the first target network layer based on the first output data, the second output data and rounding mask parameters to be trained, is configured to:
rounding up each weight value in the first target network layer based on the quantization parameter corresponding to the first target network layer, and determining a target weight value corresponding to each weight value;
updating target weight values corresponding to the weight values respectively based on the rounding mask parameters, and updating the first output data based on the updated weight values;
and determining a second loss value based on the updated first output data and the second output data, and returning to the step of determining the target weight value under the condition that the second loss value does not meet a second preset condition until the second loss value meets the second preset condition.
In a possible implementation manner, after obtaining a quantized target neural network corresponding to the network to be quantized, the adjusting module 804 is further configured to:
and training the target neural network based on the training data and the labeled data of the training data.
Referring to fig. 9, which is a schematic diagram of an architecture of a target detection apparatus provided in an embodiment of the present disclosure, the apparatus includes: an acquisition module 901, a detection module 902 and a control module 903; wherein the content of the first and second substances,
an obtaining module 901, configured to obtain point cloud data to be detected;
a detection module 902, configured to detect the point cloud data to be detected based on a target neural network quantized by any one of the neural network quantization methods provided in the embodiments of the present disclosure, and determine a detection result of the point cloud data to be detected;
and a control module 903 for controlling the target vehicle to run based on the detection result.
According to the quantization device for the neural network, after quantization processing is performed on a neural network to be quantized to obtain an initial quantization network, for a first target network layer in the initial quantization network, training data are processed to obtain intermediate output data based on an adjusted second target network layer before the first target network layer, and first output data of the first target network layer are determined based on the intermediate output data; then processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized; and adjusting the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized. Therefore, the target network layer can be adjusted under the condition that the marking data corresponding to the training data are not used, so that the labor cost required by training is saved; in addition, in the process of adjusting the first target network layer, the adjusted output data of the second target network layer is combined, so that the error caused by the adjustment of the second target network layer can be considered when the first target network layer is adjusted, the error accumulation is reduced, and the network precision of the quantized target neural network is improved.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 10, a schematic structural diagram of a computer device 1000 provided in the embodiment of the present disclosure includes a processor 1001, a memory 1002, and a bus 1003. The memory 1002 is used for storing execution instructions, and includes a memory 10021 and an external memory 10022; the memory 10021 is also referred to as a memory, and is used for temporarily storing operation data in the processor 1001 and data exchanged with the external memory 10022 such as a hard disk, the processor 1001 exchanges data with the external memory 10022 through the memory 10021, and when the computer device 1000 operates, the processor 1001 and the memory 1002 communicate through the bus 1003, so that the processor 1001 executes the following instructions:
acquiring a neural network to be quantized and training data, and quantizing a network layer to be quantized of the neural network to be quantized based on quantization parameters matched with the neural network to be quantized to obtain an initial quantization network;
processing the training data to obtain output data based on a first target network layer in the initial quantization network and an adjusted second target network layer before the first target network layer, and determining first output data of the first target network layer based on the output data;
processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized;
and adjusting the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the quantization method for neural network described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the neural network quantization method in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (15)

1. A method of quantifying a neural network, comprising:
acquiring a neural network to be quantized and training data, and quantizing a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network;
processing the training data to obtain intermediate output data based on a first target network layer to be adjusted in the initial quantization network and an adjusted second target network layer before the first target network layer, and processing the intermediate output data based on the first target network layer to obtain first output data;
processing the training data to obtain second output data based on the network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized;
and adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
2. The method of claim 1, wherein prior to performing quantization processing on a to-be-quantized network layer of the to-be-quantized neural network, the method further comprises:
and performing preliminary range adjustment on network parameters of a to-be-quantized network layer of the to-be-quantized neural network based on boundary parameters, wherein the boundary parameters are training parameters in an initial training process of the to-be-quantized neural network.
3. The method according to claim 1 or 2, wherein the quantizing the to-be-quantized network layer of the to-be-quantized neural network comprises:
and at least one of the weight value, the bias value and the activation value of the network layer to be quantized is quantized.
4. The method according to any one of claims 1 to 3, wherein the quantizing the to-be-quantized network layer of the to-be-quantized neural network to obtain an initial quantization network comprises:
for any network layer to be quantized, quantizing the network layer to be quantized based on a plurality of preset quantization parameters corresponding to the network layer to be quantized respectively to obtain a plurality of network layers to be screened after quantization processing corresponding to the network layer to be quantized;
and respectively determining the quantization loss corresponding to each network layer to be screened, and determining a quantized target network layer corresponding to the network layer to be quantized from the plurality of network layers to be screened based on the quantization loss, wherein the quantized target network layer corresponding to each network layer to be quantized forms the initial quantization network.
5. The method according to claim 4, wherein the quantization loss corresponding to the network layer to be screened is determined according to the following method:
determining the quantization loss based on the output data of the network layer to be screened and the output data of the network layer to be quantized corresponding to the network layer to be screened;
the output data of the network layer to be screened is determined based on the output data of the target network layer which is subjected to quantization processing before the network layer to be screened.
6. The method according to any of claims 1 to 5, wherein the adjusting the network parameters of the first target network layer based on the first output data and the second output data comprises:
adjusting a bias value of the first target network layer based on the first output data and the second output data; and/or adjusting the weight value of the first target network layer based on the first output data, the second output data and a rounding mask parameter to be trained, wherein the rounding mask parameter is used for rounding the weight value of the first target network layer.
7. The method of claim 6, wherein adjusting the bias value of the first target network layer based on the first output data and the second output data comprises:
determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data;
adjusting a bias value of the first target network layer based on the bias adjustment value;
determining a first loss value of the first target network layer, and updating the first output data based on the first target network layer after the offset value is adjusted under the condition that the first loss value does not meet a first preset condition;
and returning to the step of determining the offset adjustment value based on the updated first output data and the second output data until the first loss value meets the first preset condition.
8. The method of claim 7, wherein the training data comprises a sample image, the first output data comprises a first feature map, and the second output data comprises a second feature map;
the determining a bias adjustment value corresponding to the first target network layer based on the first output data and the second output data includes:
taking the difference between the first mean value of each channel in the second feature map and the second mean value of each channel in the first feature map as the bias adjustment value; alternatively, the first and second electrodes may be,
and determining the bias adjustment value based on the value of each channel of the first feature map and the second feature map after activation processing and a mask, wherein the mask is used for screening the channel values of the first feature map and the second feature map when the bias adjustment value is calculated, and the mask is determined based on the boundary parameter of the first target network layer when preliminary range adjustment is performed.
9. The method of claim 6, wherein adjusting the weight values of the first target network layer based on the first output data, the second output data, and rounding mask parameters to be trained comprises:
rounding up each weight value in the first target network layer based on the quantization parameter corresponding to the first target network layer, and determining a target weight value corresponding to each weight value;
updating target weight values corresponding to the weight values respectively based on the rounding mask parameters, and updating the first output data based on the updated weight values;
and determining a second loss value based on the updated first output data and the second output data, and returning to the step of determining the target weight value under the condition that the second loss value does not meet a second preset condition until the second loss value meets the second preset condition.
10. The method according to any one of claims 1 to 9, wherein after obtaining the quantized target neural network corresponding to the network to be quantized, the method further comprises:
and training the target neural network based on the training data and the labeled data of the training data.
11. A method of target detection, comprising:
acquiring point cloud data to be detected;
detecting the point cloud data to be detected based on a target neural network obtained by quantization according to the quantization method of the neural network of any one of claims 1 to 10, and determining the detection result of the point cloud data to be detected;
and controlling the target vehicle to run based on the detection result.
12. An apparatus for quantizing a neural network, comprising:
the quantization module is used for acquiring a neural network to be quantized and training data, and performing quantization processing on a network layer to be quantized of the neural network to be quantized to obtain an initial quantization network;
a determining module, configured to, for a first target network layer to be adjusted in the initial quantization network, process the training data to obtain intermediate output data based on an adjusted second target network layer before the first target network layer, and process the intermediate output data based on the first target network layer to obtain first output data;
the processing module is used for processing the training data to obtain second output data based on network layers to be quantized, which correspond to the first target network layer and the second target network layer respectively, in the neural network to be quantized;
and the adjusting module is used for adjusting the network parameters of the first target network layer based on the first output data and the second output data to obtain a quantized target neural network corresponding to the network to be quantized.
13. An object detection device, comprising:
the acquisition module is used for acquiring point cloud data to be detected;
the detection module is used for detecting the point cloud data to be detected based on the target neural network obtained by the quantization method of the neural network according to any one of claims 1 to 10 and determining the detection result of the point cloud data to be detected;
and the control module is used for controlling the target vehicle to run based on the detection result.
14. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions, when executed by the processor, performing the steps of the method of quantifying a neural network as claimed in any one of claims 1 to 10, or performing the steps of the method of object detection as claimed in claim 11.
15. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, is adapted to carry out the steps of the method for quantization of neural networks of any one of claims 1 to 10, or the steps of the method for object detection of claim 11.
CN202210603119.2A 2022-05-30 2022-05-30 Neural network quantification method, target detection method and device Pending CN115018070A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210603119.2A CN115018070A (en) 2022-05-30 2022-05-30 Neural network quantification method, target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210603119.2A CN115018070A (en) 2022-05-30 2022-05-30 Neural network quantification method, target detection method and device

Publications (1)

Publication Number Publication Date
CN115018070A true CN115018070A (en) 2022-09-06

Family

ID=83070900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210603119.2A Pending CN115018070A (en) 2022-05-30 2022-05-30 Neural network quantification method, target detection method and device

Country Status (1)

Country Link
CN (1) CN115018070A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543419A (en) * 2023-07-06 2023-08-04 浙江大学金华研究院 Hotel health personnel wearing detection method and system based on embedded platform
CN116739039A (en) * 2023-05-05 2023-09-12 北京百度网讯科技有限公司 Quantization method, device, equipment and medium of distributed deployment model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739039A (en) * 2023-05-05 2023-09-12 北京百度网讯科技有限公司 Quantization method, device, equipment and medium of distributed deployment model
CN116543419A (en) * 2023-07-06 2023-08-04 浙江大学金华研究院 Hotel health personnel wearing detection method and system based on embedded platform
CN116543419B (en) * 2023-07-06 2023-11-07 浙江大学金华研究院 Hotel health personnel wearing detection method and system based on embedded platform

Similar Documents

Publication Publication Date Title
CN115018070A (en) Neural network quantification method, target detection method and device
EP3474132B1 (en) Arithmetic processor, arithmetic processing apparatus including arithmetic processor, information processing apparatus including arithmetic processing apparatus, and control method for arithmetic processing apparatus
US11216721B2 (en) Method for calculating a neuron layer of a multi-layer perceptron model with simplified activation function
EP3340129B1 (en) Artificial neural network class-based pruning
WO2021043294A1 (en) Neural network pruning
CN110969251A (en) Neural network model quantification method and device based on label-free data
CN112183620B (en) Development method and system of small sample classification model based on graph convolution neural network
JP2022513404A (en) Quantization of trained long short-term memory neural networks
CN112288086A (en) Neural network training method and device and computer equipment
CN112101207B (en) Target tracking method and device, electronic equipment and readable storage medium
CN112200889A (en) Sample image generation method, sample image processing method, intelligent driving control method and device
CN112200296A (en) Network model quantification method and device, storage medium and electronic equipment
CN114358197A (en) Method and device for training classification model, electronic equipment and storage medium
CN112949519A (en) Target detection method, device, equipment and storage medium
CN111260056B (en) Network model distillation method and device
CN112966592A (en) Hand key point detection method, device, equipment and medium
CN111383157B (en) Image processing method and device, vehicle-mounted operation platform, electronic equipment and system
CN116185568A (en) Container expansion method and device, electronic equipment and storage medium
EP3764217A1 (en) Arithmetic processing apparatus, control method, and control program
US20230086727A1 (en) Method and information processing apparatus that perform transfer learning while suppressing occurrence of catastrophic forgetting
CN115761429A (en) Vehicle track prediction method and device, and track prediction model training method and device
CN112446428B (en) Image data processing method and device
CN110097183B (en) Information processing method and information processing system
CN117808040B (en) Method and device for predicting low forgetting hot events based on brain map
KR102583943B1 (en) A neural network apparatus and neural network learning method for performing continuous learning using a correlation analysis algorithm between tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination