CN115034389A

CN115034389A - Neural network quantization and processing method and device, electronic equipment and storage medium

Info

Publication number: CN115034389A
Application number: CN202210911217.2A
Authority: CN
Inventors: 张卓翼
Original assignee: Shanghai Power Tensors Intelligent Technology Co Ltd
Current assignee: Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-09-09

Abstract

The present disclosure provides a neural network quantization and processing method, device, electronic device and storage medium, the method comprising: acquiring a neural network to be quantized; for any network processing layer to be quantized in the neural network to be quantized, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression; the quantization error expression is used for determining an error between the feature data and quantized data corresponding to the feature data; in order to quantize the operation feature data of the network processing layer in the neural network to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized in the inference process of the neural network to be quantized.

Description

Neural network quantization and processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a neural network quantization and processing method, an apparatus, an electronic device, and a storage medium.

Background

In recent years, Deep Neural Networks (DNNs) have been widely used in the field of computer vision, such as image classification and object detection. Among them, a deeper or wider network structure has higher accuracy, but the cost is higher computational complexity and increased memory requirements, so that the application of the network structure on the resource-limited embedded device has a limitation.

Generally, one possible approach to deploying deep neural networks on embedded devices is to quantify the weights and activations of the full-precision network, reducing the number of discrete values by reducing the bit widths required to save the data. Therefore, it is important to provide a quantization method with high precision.

Disclosure of Invention

In view of the above, the present disclosure provides at least a neural network quantization and processing method, apparatus, electronic device and storage medium.

In a first aspect, the present disclosure provides a neural network quantization method, including:

acquiring a neural network to be quantized;

for any network processing layer to be quantized in the neural network to be quantized, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data and a quantization error expression of the network processing layer to be quantized; the quantization error expression is used for determining an error between the feature data and quantized data corresponding to the feature data; so that in the inference process of the neural network to be quantized, the neural network to be quantized is quantized based on the target segmentation point corresponding to the network processing layer to be quantized.

According to the method, after the neural network to be quantized is obtained, the target segmentation point corresponding to the feature data of the network processing layer to be quantized is determined for any network processing layer to be quantized in the neural network to be quantized based on the feature data and the quantization error expression of the network processing layer to be quantized, the quantization error expression can be used for determining the error between the feature data and the quantized data corresponding to the feature data, so that the error of the selected target segmentation point is small, the determination of the target segmentation point is accurate, further, the quantization processing can be accurately performed on the neural network to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized, and the quantization accuracy is improved on the basis of ensuring the quantization rate.

In one possible implementation, the determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression includes:

determining the number of initialization iterations as the number of current iterations, determining a target characteristic value as a historical segmentation point, determining an initialization error as an error threshold value, and determining a current segmentation point corresponding to the number of current iterations based on the historical segmentation point and the total number of iterations; wherein the target characteristic value is a characteristic value with the maximum absolute value in the characteristic data;

determining a quantization error corresponding to the current segmentation point based on the current segmentation point, the feature data and the quantization error expression;

under the condition that the quantization error is smaller than the error threshold, adding one to the current iteration frequency to obtain an updated current iteration frequency; determining the quantization error as an updated error threshold, determining the current segmentation point as an updated historical segmentation point, returning to the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number;

and determining the historical segmentation point obtained after the last iteration as a target segmentation point corresponding to the feature data of the network processing layer to be quantized.

In the embodiment of the disclosure, by means of multiple iterations, based on the quantization error of the current segment point determined in each iteration process, the target segment point corresponding to the feature data of the network processing layer to be quantized can be determined more accurately.

In a possible embodiment, the method further comprises:

under the condition that the quantization error corresponding to the current segmentation point is larger than or equal to the error threshold, adding one to the current iteration number to obtain an updated current iteration number, and returning to the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is larger than the total iteration number;

and determining the current segmentation point obtained after the last iteration as a target segmentation point corresponding to the feature data of the network processing layer to be quantized.

In a possible embodiment, the determining a quantization error corresponding to the current segmentation point based on the current segmentation point, the feature data and the quantization error expression includes:

determining a quantized feature value corresponding to each feature value in the feature data based on the current segmentation point and a set quantization expression;

and determining the quantization error corresponding to the current segmentation point based on each characteristic value in the characteristic data, the quantized characteristic value corresponding to each characteristic value and the quantization error expression.

During implementation, the quantized feature value corresponding to each feature value in the feature data under the current segmentation point can be determined based on the current segmentation point and the set quantization expression, and then the quantization error corresponding to the current segmentation point is determined more accurately based on each feature value in the feature data, the quantized feature value corresponding to each feature value and the quantization error expression, so that a determination basis is provided for subsequently determining the target segmentation point.

In a possible implementation manner, the determining, based on the current segmentation point and the set quantization expression, a quantized feature value corresponding to each feature value in the feature data includes:

determining a first segmentation interval and a second segmentation interval corresponding to the feature data based on the current segmentation point; wherein the first staging bay comprises: an interval between a negative value of the current segmentation point and a positive value of the current segmentation point; the second segment interval includes: an interval between a positive value of the current segmentation point and a positive value of the target characteristic value, and an interval between a negative value of the target characteristic value and a negative value of the current segmentation point;

for each feature value in the feature data, under the condition that the feature value is located in the first segment, determining a quantized feature value corresponding to the feature value by using a first quantization expression corresponding to the first segment;

and under the condition that the characteristic value is located in the second segment interval, determining a quantized characteristic value corresponding to the characteristic value by using a second quantization expression corresponding to the second segment interval.

And then, based on quantization expressions respectively corresponding to the first segment interval and the second segment interval, determining a quantized feature value corresponding to each feature value, and providing data support for subsequently determining quantization errors.

In a possible implementation manner, in a case that the feature data includes input feature data and/or output feature data, before the determining, based on the feature data of the network processing layer to be quantized and a quantization error expression, a target segmentation point corresponding to the feature data of the network processing layer to be quantized, further includes:

acquiring calibration data, wherein the calibration data comprises at least one calibration image;

for each calibration image, determining feature data corresponding to the network processing layer to be quantized and matched with the calibration image based on the calibration image and the neural network to be quantized.

When the feature data includes input feature data and/or output feature data, since the input feature data and the output feature data are related to an inference process of the neural network, calibration data may be acquired in order to determine a target segmentation point corresponding to the input feature data and the output feature data in advance, and for each calibration image, the feature data corresponding to the network processing layer to be quantized and matching with the calibration image is determined based on the calibration image and the neural network to be quantized, so as to achieve determination of the target segmentation point on the basis of the feature data corresponding to the calibration image.

for each calibration image, determining candidate segmentation points corresponding to the feature data of the calibration image based on the quantization error expression and the feature data corresponding to the network processing layer to be quantized and matched with the calibration image;

and determining a target segmentation point corresponding to the network processing layer to be quantized based on the candidate segmentation points respectively corresponding to the characteristic data of each calibration image.

In the embodiment of the disclosure, when a plurality of calibration images are provided, the candidate segmentation point corresponding to each calibration image may be determined, and then the target segmentation point corresponding to the network processing layer to be quantized is determined more accurately based on the candidate segmentation points corresponding to the feature data of each calibration image.

In a second aspect, the present disclosure provides a neural network processing method, including:

acquiring a target segmentation point corresponding to at least one network processing layer to be quantized in a neural network to be quantized;

quantizing the feature data of the network processing layer to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized to obtain quantized feature data;

and determining an inference result of the neural network to be quantized based on the quantized feature data.

The following descriptions of the effects of the apparatus, the electronic device, and the like refer to the description of the above method, and are not repeated here.

In a third aspect, the present disclosure provides a neural network quantization apparatus, including:

the first acquisition module is used for acquiring a neural network to be quantized;

the first determination module is used for determining a target segmentation point corresponding to the feature data of the to-be-quantized network processing layer according to the feature data and the quantization error expression of any to-be-quantized network processing layer in the to-be-quantized neural network; the quantization error expression is used for determining an error between the feature data and quantized data corresponding to the feature data; in order to carry out quantization processing on the neural network to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized in the inference process of the neural network to be quantized.

In one possible implementation, the first determining module, when determining the target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression, is configured to:

determining the number of initialization iterations as the number of current iterations, determining a target characteristic value as a historical segmentation point, determining an initialization error as an error threshold, and determining a current segmentation point corresponding to the number of current iterations based on the historical segmentation point and the total number of iterations; wherein the target characteristic value is a characteristic value with the maximum absolute value in the characteristic data;

In a possible implementation, the first determining module is further configured to:

In a possible implementation, the first determining module, when determining the quantization error corresponding to the current segmentation point based on the current segmentation point, the feature data and the quantization error expression, is configured to:

In one possible implementation, the first determining module, when determining the quantized feature value corresponding to each feature value in the feature data based on the current segmentation point and the set quantized expression, is configured to:

In a possible implementation manner, in a case that the feature data includes input feature data and/or output feature data, before the determining, based on the feature data of the network processing layer to be quantized and a quantization error expression, a target segmentation point corresponding to the feature data of the network processing layer to be quantized, further includes: an extraction module to:

In a fourth aspect, the present disclosure provides a neural network processing apparatus, comprising:

the second acquisition module is used for acquiring a target segmentation point corresponding to at least one network processing layer to be quantized in the neural network to be quantized;

the quantization module is used for performing quantization processing on the feature data of the network processing layer to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized to obtain quantized feature data;

and the second determining module is used for determining an inference result of the neural network to be quantified based on the quantified characteristic data.

In a fifth aspect, the present disclosure provides an electronic device comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the neural network quantization method according to the first aspect or any of the embodiments described above, or performing the steps of the neural network processing method according to the second aspect described above.

In a sixth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the neural network quantization method as set forth in the first aspect or any one of the embodiments described above, or performs the steps of the neural network processing method as set forth in the second aspect described above.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flow chart of a neural network quantization method provided by an embodiment of the present disclosure;

fig. 2 is a schematic flow chart diagram illustrating a neural network processing method provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating an architecture of a neural network quantization apparatus provided in an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating an architecture of a neural network processing device according to an embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the disclosure, provided in the accompanying drawings, is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

One possible approach to deploying deep neural networks on embedded devices is to quantify the weights and activations of the full-precision network, reducing the number of discrete values by reducing the bit width required to store the data.

When convolutional neural networks of different tasks and different structures are researched, it is found that the weight features or intermediate features (such as input features and output features of a network processing layer) of convolutional layers are distributed in a bell shape, that is, more features are located near a value 0, less features are located at two boundaries, and a linear quantization function is a uniform quantization interval for each range of values, so that linear quantization cannot well describe the bell-shaped distribution generally. In order to describe bell-shaped distribution of features, a nonlinear quantization function is provided to quantize a convolutional neural network, but the nonlinear quantization mode needs more quantization parameters, needs more computing resources in the quantization process, and is not friendly to hardware.

In another way, a piecewise linear quantization function can be used for quantization, and only one hyperparameter for a piecewise point position needs to be added in comparison with linear quantization. In the segmented quantization mode, the determination of the position of the segmented point is the key for determining the quantization precision. In general, the parameter distribution of the convolutional neural network can be assumed to be gaussian distribution or laplacian distribution, and then the optimal segmentation point position is derived by using the distribution characteristics. However, the above-mentioned way of determining the segmentation points makes strong a priori assumptions about the distribution of the neural network, and in practice, the parameter distribution of each network and each layer of the network is different, and the method based on the assumed distribution makes it difficult to guarantee the accuracy of quantization.

In order to alleviate the above problem, in the method, for each network processing layer, a target segment point corresponding to feature data of the network processing layer to be quantized is determined based on the feature data of the network processing layer to be quantized and a quantization error expression, where the quantization error expression is used to determine an error between the feature data and quantized data corresponding to the feature data, for example, a segment point with a smaller error may be selected as a target segment point, and the determination of the target segment point is more accurate, so that, based on the target segment point corresponding to the network processing layer to be quantized, the neural network to be quantized can be more accurately quantized, and the quantization accuracy is improved.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

For the understanding of the embodiments of the present disclosure, a neural network quantization method disclosed in the embodiments of the present disclosure will be described in detail first. The execution subject of the neural network quantization method provided by the embodiment of the present disclosure is generally a computer device with certain computing power, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or a server or other processing device. In some possible implementations, the neural network quantization method may be implemented by a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, a schematic flow chart of a neural network quantization method provided in an embodiment of the present disclosure is shown, the method includes S101-S102, where:

s101, obtaining a neural network to be quantized;

s102, aiming at any network processing layer to be quantized in the neural network to be quantized, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data and the quantization error expression of the network processing layer to be quantized; the quantization error expression is used for determining errors between the characteristic data and quantized data corresponding to the characteristic data; so that the neural network to be quantized is quantized based on the target segmentation points corresponding to the network processing layer to be quantized in the inference process of the neural network to be quantized.

According to the method, after the neural network to be quantized is obtained, the target segmentation point corresponding to the feature data of the network processing layer to be quantized is determined aiming at any network processing layer to be quantized in the neural network to be quantized based on the feature data of the network processing layer to be quantized and the quantization error expression, and the quantization error expression can be used for determining the error between the feature data and the quantized data corresponding to the feature data, so that the error of the selected target segmentation point is small, the determination of the target segmentation point is accurate, further, the quantization processing can be accurately performed on the neural network to be quantized (for example, the operation feature data of the network processing layer in the neural network to be quantized) based on the target segmentation point corresponding to the network processing layer to be quantized, and the quantization accuracy is improved on the basis of guaranteeing the quantization rate.

S101 and S102 will be specifically described below.

For S101:

here, the neural network to be quantized may be any neural network after training, for example, the neural network to be quantized may be a neural network for performing a semantic segmentation task, a neural network for performing a classification task, a neural network for performing a regression task, or the like.

The network structure of the neural network to be quantized may be set as required, for example, the neural network to be quantized may include a convolution processing layer, a full-link processing layer, a pooling processing layer, an activation processing layer, and the like.

For S102:

after the neural network to be quantized is acquired, a network processing layer to be quantized in the neural network to be quantized can be determined. For example, each network processing layer in the neural network to be quantized may be determined as a network processing layer to be quantized; or, a part of the network processing layer in the neural network to be quantized may also be determined as the network processing layer to be quantized. Wherein, the network processing layer to be quantized can be set according to the requirement.

After the network processing layers to be quantized are determined, for any network processing layer to be quantized, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression; the quantization error expression is used for determining the error between the characteristic data and the quantized data corresponding to the characteristic data.

In one embodiment, a plurality of initial segmentation points may be determined, and then a target segmentation point may be selected from the plurality of initial segmentation points according to a quantization error expression. For example, first, a plurality of initial segmentation points may be determined from the feature interval in which each feature value of the feature data is located. For example, a feature value with the largest absolute value among the feature values of the feature data may be determined, and a feature interval may be determined according to the feature value; assuming that the eigenvalue with the largest absolute value is m, the characteristic interval may be [ -m, m ]. Then, a plurality of initial segmentation points are determined from the feature interval, for example, a preset number of initial segmentation points may be uniformly selected from the feature interval, or a preset number of initial segmentation points may also be non-uniformly selected from the feature interval, for example, a smaller number of initial segmentation points are selected at boundary positions (positions close to m and-m) of the feature interval, and a larger number of initial segmentation points are selected at middle positions (positions close to 0) of the feature interval.

Secondly, for each initial segmentation point, determining an initial quantization error corresponding to the initial segmentation point based on the initial segmentation point, the feature data and the quantization error expression. During implementation, the characteristic data can be linearly quantized based on the initial segmentation point to obtain quantized data corresponding to the characteristic data at the initial segmentation point; and then, determining the quantization error between the characteristic data and the quantized data by using a quantization error expression to obtain an initial quantization error corresponding to the initial segmentation point. Therefore, the initial quantization errors corresponding to the initial segmentation points can be obtained.

And further, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the initial quantization error corresponding to each initial segmentation point. For example, an initial segmentation point with the minimum initial quantization error may be selected and determined as a target segmentation point corresponding to the feature data of the network processing layer to be quantized.

In another embodiment, a target segmentation point corresponding to the feature data of the network processing layer to be quantized may be determined in an iterative manner based on the feature data of the network processing layer to be quantized and the quantization error expression.

In S102, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and the quantization error expression, specifically including:

step A1, determining the number of initialization iterations as the current number of iterations, determining the target characteristic value as a historical segmentation point, determining the initialization error as an error threshold, and determining the current segmentation point corresponding to the current number of iterations based on the historical segmentation point and the total number of iterations; the target characteristic value is the characteristic value with the maximum absolute value in the characteristic data.

Step A2, based on the current segmentation point, the characteristic data and the quantization error expression, determining the quantization error corresponding to the current segmentation point.

Step A3, under the condition that the quantization error is smaller than the error threshold, adding one to the current iteration frequency to obtain the updated current iteration frequency; and determining the quantization error as an updated error threshold, determining the current segmentation point as an updated historical segmentation point, returning to the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number.

And step A4, under the condition that the quantization error corresponding to the current segmentation point is greater than or equal to the error threshold, adding one to the current iteration number to obtain the updated current iteration number, and returning to the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number.

Step A5, determining the historical segment points obtained after the last iteration as the target segment points corresponding to the characteristic data of the network processing layer to be quantified.

In step a1, an initialization iteration number is set, for example, the initialization iteration number may be the first time. The initialization error is set, for example, the initialization error may be a preset value (e.g., 1000, etc.). And determining a target characteristic value, namely determining the characteristic value with the maximum absolute value as the target characteristic value from all the characteristic values included in the characteristic data. For example, the maximum eigenvalue and the minimum eigenvalue in each eigenvalue included in the eigenvalue data may be determined first, the first absolute value corresponding to the maximum eigenvalue and the second absolute value corresponding to the minimum eigenvalue may be determined, and the maximum value of the first absolute value and the second absolute value may be determined as the target eigenvalue.

The number of initialization iterations may be determined as the current number of iterations, the target feature value may be determined as a historical segmentation point, the initialization error may be determined as an error threshold, and the iteration process corresponding to the current number of iterations may be performed.

In implementation, the current segmentation point corresponding to the current iteration number may be determined based on the historical segmentation point and the total iteration number. For example, the current segmentation point corresponding to the current iteration number may be determined according to the following formula (1):

wherein n is the total iteration number, i is the current iteration number, alpha _i For the current segmentation point, α _i-1 Is a historical segmentation point.

In step a2, after obtaining a current segmentation point corresponding to the current iteration number, performing linear quantization processing on the feature data based on the current segmentation point to obtain quantized data corresponding to the feature data under the current segmentation point; and then, determining the quantization error corresponding to the current segmentation point based on the characteristic data and the quantized data by using a quantization error expression.

In an optional embodiment, determining a quantization error corresponding to a current segmentation point based on the current segmentation point, feature data, and a quantization error expression specifically includes:

step A21, based on the current segmentation point and the set quantization expression, determining the quantized feature value corresponding to each feature value in the feature data.

Step A22, determining the quantization error corresponding to the current segmentation point based on each feature value in the feature data, the quantized feature value corresponding to each feature value, and the quantization error expression.

During implementation, the quantized feature value corresponding to each feature value in the feature data under the current segmentation point can be determined based on the current segmentation point and the set quantization expression, and then the quantization error corresponding to the current segmentation point is more accurately determined based on each feature value in the feature data, the quantized feature value corresponding to each feature value and the quantization error expression, so that a determination basis is provided for subsequently determining the target segmentation point.

In step a21, the quantization expression is a linear quantization expression. After the current segmentation point is obtained, a target expression corresponding to the current segmentation point can be determined based on the set quantization expression; inputting each characteristic value in the characteristic data into a target expression to obtain a quantized characteristic value corresponding to the characteristic value; and forming quantized data corresponding to the feature data based on the quantized feature values.

In an alternative embodiment, determining a quantized feature value corresponding to each feature value in the feature data based on the current segmentation point and the set quantized expression includes:

step one, determining a first segmentation interval and a second segmentation interval corresponding to feature data based on a current segmentation point; wherein, include between the first segmentation interval: the interval from the negative value of the current segmentation point to the positive value of the current segmentation point; the second segment interval includes: an interval between a positive value of the current segmentation point to a positive value of the target feature value, and an interval between a negative value of the target feature value to a negative value of the current segmentation point.

And secondly, determining the quantized characteristic value corresponding to the characteristic value by utilizing a first quantization expression corresponding to the first segment under the condition that the characteristic value is located in the first segment.

And thirdly, under the condition that the characteristic value is located in the second segment interval, determining the quantized characteristic value corresponding to the characteristic value by using a second quantization expression corresponding to the second segment interval.

If the current segmentation point is α, the first segmentation interval corresponding to the feature data may include [ - α, α ], and the second segmentation interval may include: [ -m, - α) U (α, m ]. Wherein m is a target characteristic value.

After the first segment interval and the second segment interval are obtained, a first quantization expression corresponding to the first segment interval and a second quantization expression corresponding to the second segment interval may be determined based on the set quantization expressions. The first and second quantization expressions are as shown in the following equation (2):

wherein p (x) is a quantized feature value; sign () denotes a sign function, and uni () denotes a linear quantization function. The linear quantization function is shown in equation (3) below:

where round () is a rounding function and b represents the quantization bit width. x is a radical of a fluorine atom _h Is the maximum value within the segmentation interval, i.e. in equation (2), x _h Is alpha, x _l Is the minimum value within the segmentation interval, i.e. in equation (2), x _l Is 0. z is zero point, i.e., z is 0 in formula (2). x is an eigenvalue in the feature data, corresponding to | x | in the formula (2).

Further, quantization is hardware device dependent, since the purpose of quantization is to reduce computation time and energy consumption, but this computation implementation is hardware implementation. The quantization bit width can be determined from the integrated circuit (chip) running the neural network to be quantized. As a hardware-constrained device, the accuracy of the data types supported by the chip is often lower than for a non-hardware-constrained device (e.g., a server). For example, when the data type supported by the chip is int8 (that is, the data type of the quantized feature value is int8), in order to make the quantized feature value be within the data range of the data type supported by the chip, for example, int8, it is necessary to perform a scaling operation on the value obtained by the rounding function by using the scaling factor s, so that the value of b is related to the data type, and therefore, when the data type is int8, the value of b may be set to 8.

And for each feature value in the feature data, when the feature value is located in the first segmentation interval, namely the feature value is located in [ - α, α ], determining a quantized feature value corresponding to the feature value by using the first quantization expression. And when the characteristic value is positioned in the second segmentation interval, namely the characteristic value is positioned in [ -m, -alpha) U (alpha, m ], determining the quantized characteristic value corresponding to the characteristic value by using a second quantization expression.

In step a22, a quantization error corresponding to the current segmentation point is determined based on each feature value in the feature data and the quantized feature value corresponding to each feature value by using a quantization error expression. The quantization error expression is shown in the following equation (4):

wherein, omega is the characteristic value at the x position in the characteristic data,

for the quantized eigenvalue corresponding to the eigenvalue, p is a norm, for example, p may be 1, 2, and so on.

After the quantization error corresponding to the current segment end is obtained, the quantization error is compared with the current error threshold value. If the quantization error corresponding to the current segmentation end is smaller than the error threshold, executing step A3; if the quantization error corresponding to the current segment end is greater than or equal to the error threshold, step a4 is executed.

In step a3, under the condition that the quantization error is smaller than the error threshold, performing an operation of adding one to the current iteration number to obtain an updated current iteration number; for example, if the current iteration count is the first time (i.e., i equals 1), the updated current iteration count is the second time (i.e., i equals 2).

And determining the quantization error as an updated error threshold, wherein the updated error threshold is an error threshold corresponding to the updated current iteration number. And determining the current segmentation point as the updated historical segmentation point, namely the updated historical segmentation point is the historical segmentation point corresponding to the updated current iteration number. And returning to the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number.

In step a4, when the quantization error is greater than or equal to the error threshold, an operation is performed to add one to the current iteration number to obtain an updated current iteration number. And the error threshold and the historical segmentation point are not updated, and the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number is directly returned until the updated current iteration number is greater than the total iteration number.

In step a5, the historical segment point obtained after the last iteration may be further determined as a target segment point corresponding to the feature data of the network processing layer to be quantized.

If the total iteration number is 100, after obtaining a current segmentation point corresponding to the 100 th iteration, determining a quantization error of the current segmentation point, and if the quantization error is smaller than an error threshold corresponding to the 100 th iteration process, determining the current segmentation point corresponding to the 100 th iteration as a historical segmentation point obtained after the last iteration; and if the quantization error is greater than or equal to the error threshold corresponding to the 100 th iteration process, determining the historical segmentation point used in the 100 th iteration process as the historical segmentation point obtained after the last iteration. Namely, in the above-mentioned 100 iteration processes, a target segmentation point with the minimum quantization error is determined, and the target segmentation point is the selected optimal segmentation point.

In implementation, the feature data of the network processing layer to be quantized includes at least one of weight feature data, input feature data, and output feature data.

Under the condition that the feature data comprises weight feature data, the weight feature data of the network processing layer can be obtained, and the target segmentation point corresponding to the weight feature data of the network processing layer to be quantized is determined based on the weight feature data and the quantization error expression. The determination process may refer to the processes of step a1 through step a5, which are not described in detail herein.

In the case that the feature data includes input feature data and/or output feature data, before determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and the quantization error expression, the method further includes: acquiring calibration data, wherein the calibration data comprises at least one calibration image; and for each calibration image, determining characteristic data which is matched with the calibration image and corresponds to the network processing layer to be quantized based on the calibration image and the neural network to be quantized.

The calibration data can be selected from test samples corresponding to the neural network to be quantified, the calibration data comprises at least one calibration image, each calibration image is input into the neural network to be quantified, each network processing layer included in the neural network to be quantified extracts the characteristics of the calibration image, and input characteristic data and output characteristic data which correspond to each network processing layer and are matched with the calibration image are obtained. The characteristic data which is corresponding to the network processing layer to be quantized and is matched with the calibration image is obtained.

When the feature data includes input feature data and/or output feature data, since the input feature data and the output feature data are related to an inference process of the neural network, in order to predetermine target segmentation points corresponding to the input feature data and the output feature data, calibration data may be obtained, and for each calibration image, based on the calibration image and the neural network to be quantized, feature data corresponding to a network processing layer to be quantized and matching the calibration image is determined, so as to implement determination of the target segmentation points on the basis of the feature data corresponding to the calibration image.

In an alternative embodiment, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression includes:

for each calibration image, determining candidate segmentation points corresponding to the feature data of the calibration image based on the quantization error expression and the feature data corresponding to the network processing layer to be quantized and matched with the calibration image; and determining a target segmentation point corresponding to the network processing layer to be quantized based on the candidate segmentation points corresponding to the characteristic data of each calibration image.

Taking the example that the feature data includes output feature data as an example, for each calibration image, the output feature data corresponding to the network processing layer to be quantized and matching with the calibration image is determined. And determining candidate segmentation points corresponding to the output characteristic data of the calibration image based on the quantization error expression and the output characteristic data matched with the calibration image and corresponding to the network processing layer to be quantized.

For example, for output feature data of each calibration image in a network processing layer, an output feature interval of the output feature data may be determined first, and a plurality of output segmentation points may be selected from the output feature interval; then, for each output segment point, based on the output segment point, performing linear quantization on the output characteristic data (for example, performing linear quantization by using formula 2 and formula 3) to obtain quantized output data corresponding to the output characteristic data at the output segment point; and then, determining the quantization error between the output characteristic data and the quantized output data by using a quantization error expression (such as a formula 4) to obtain an initial quantization error corresponding to the output segmentation point. Therefore, the initial quantization errors corresponding to the output segmentation points can be obtained. And determining candidate segmentation points from the plurality of output segmentation points according to the initial quantization error, for example, selecting the output segmentation point with the minimum initial quantization error as the candidate segmentation point.

For another example, the candidate segmentation points may be determined by multiple iterative selections. Specifically, the method may include: step 1, determining the first iteration times as current iteration times, determining a characteristic value with the maximum absolute value in output characteristic data as a target output characteristic value, determining the target output characteristic value as a historical segmentation point, determining an initialization error as an error threshold, and determining a current output segmentation point corresponding to the current iteration times based on the historical segmentation point and the total iteration times. And 2, determining an output quantization error corresponding to the current output segmentation point based on the current output segmentation point, the output characteristic data and the quantization error expression.

Step 3, under the condition that the output quantization error is smaller than the error threshold, adding one to the current iteration frequency to obtain the updated current iteration frequency; and determining the output quantization error as an updated error threshold, determining the current output segmentation point as an updated historical segmentation point, returning to the step of determining the current output segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number. And 4, under the condition that the output quantization error corresponding to the current output segmentation point is greater than or equal to the error threshold, adding one to the current iteration number to obtain the updated current iteration number, and returning to the step of determining the current output segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number. And 5, determining the historical segmentation points obtained after the last iteration as candidate segmentation points corresponding to the output characteristic data of the network processing layer to be quantized.

The process of determining candidate segmentation points (i.e., steps 1 to 5 above) may refer to the detailed description of steps a1 to a5, and will not be described in detail here.

And then, determining a target segmentation point corresponding to the output characteristic data of the network processing layer to be quantized based on the candidate segmentation points corresponding to the output characteristic data of each calibration image. For example, each candidate segmentation point may be averaged to obtain a target segmentation point; or, a candidate segmentation point may be randomly selected from the plurality of candidate segmentation points as a target segmentation point; or clustering the candidate segmentation points to obtain at least one segmentation point set (each segmentation point set comprises at least one candidate segmentation point), determining the target segmentation point set with the largest number of the candidate segmentation points, determining the average value of the candidate segmentation points in the target segmentation point set, and determining the average value as the target segmentation point.

In the embodiment of the present disclosure, when there are a plurality of calibration images, the candidate segmentation point corresponding to each calibration image may be determined, and then the target segmentation point corresponding to the network processing layer to be quantized is determined more accurately based on the candidate segmentation points corresponding to the feature data of each calibration image.

After the execution subject, such as a server, a terminal device, etc., determines the target segment point of the feature data, if the feature data includes the weight feature data, the target segment point corresponding to the weight feature data of the network processing layer to be quantized may be obtained, and the weight feature data of the network processing layer of the neural network to be quantized is quantized based on the target segment point corresponding to the weight feature data of the network processing layer to be quantized, so as to obtain the quantized weight feature data, so that in the inference process of the neural network, the quantized weight feature data is used for data processing, and the output feature data of the network processing layer is obtained. The quantitative processing process of the weight characteristic data can be carried out in real time in the reasoning process of the neural network, and can also be carried out in advance before the reasoning process of the neural network. The weight feature data of the network processing layer of the neural network to be quantized can be quantized by using formula (2) and formula (3), where α in formula (2) is a target segmentation point.

If the feature data comprises input feature data, target segmentation points corresponding to the input feature data of the network processing layer to be quantized can be obtained, and quantization processing is performed on the input feature data of the network processing layer of the neural network to be quantized based on the target segmentation points corresponding to the input feature data of the network processing layer to be quantized, so that quantized input feature data are used for data processing in the reasoning process of the neural network, and output feature data of the network processing layer are obtained. The quantization process of the input feature data can be performed in real time in the inference process of the neural network. The input feature data of the network processing layer of the neural network to be quantized can be quantized by using formula (2) and formula (3), where α in formula (2) is a target segmentation point.

If the feature data comprises weight feature data and input feature data, a target segmentation point 1 corresponding to the weight feature data of the network processing layer to be quantized and a target segmentation point 2 corresponding to the input feature data can be obtained, and based on the target segmentation point 1 and the target segmentation point 2 corresponding to the network processing layer to be quantized, the weight feature data and the input feature data used by the network processing layer are quantized in the inference process of the neural network to be quantized respectively to obtain quantized weight feature data and quantized input feature data; and then, performing feature processing by using the quantized weight feature data and the quantized input feature data, for example, if the network processing layer is a convolutional layer, performing convolutional operation on the quantized weight feature data and the quantized input feature data to obtain output feature data of the network processing layer.

In particular, after determining the target segmentation points of the feature data of the network processing layer to be quantized, such as the target segmentation points of the weighted feature data and the target segmentation points of the output feature data, the inference process of the neural network to be quantized may be run on a chip (e.g., an Artificial Intelligence (AI) chip) based on the target segmentation points.

For example, in an inference process in which the chip runs the neural network to be quantized, for a network processing layer of the neural network to be quantized, the chip may determine a target segmentation point of feature data (such as weighted feature data, input feature data, and the like) of the network processing layer, or may also receive the target segmentation point of the feature data of the network processing layer. And performing linear quantization on the weight characteristic data based on the target segment point of the weight characteristic data by using a quantization operation module for performing quantization processing in a processor of the chip to obtain quantized weight characteristic data, for example, quantizing the data type of the weight characteristic data from floating point float32 to integer int8 to obtain quantized weight characteristic data, and performing linear quantization on the input characteristic data based on the target segment point of the input characteristic data to obtain quantized input characteristic data.

And then, a characteristic operation module which executes the operation process of the network processing layer in a processor of the chip is utilized to operate the quantized weight characteristic data and the quantized input characteristic data to obtain output characteristic data. Assume that the data type of the output characteristic data is int8 (i.e. quantized output characteristic data). And storing the output characteristic data into a memory, or performing subsequent processing by using the output characteristic data as input characteristic data of a next network processing layer. The quantization operation module may include an operation device required for performing quantization processing, and for example, may include an adder, a multiplier, and the like. And the feature operation module may include an operation device required for performing an operation process of the network processing layer, for example, when the network processing layer is a convolutional layer, the feature operation module may include an adder, a multiplier, an accumulator, and the like.

By determining the target segmentation point of the feature data, the feature data can be accurately quantized by the chip, so that the quantized feature data can be obtained. On the basis of ensuring the data precision, the data precision of the characteristic data is reduced, so that when the quantized characteristic data is stored on a chip, the bit width required for storing the data can be reduced, and when the quantized characteristic data is processed on the chip, the resources consumed by operation can be reduced, the operation power consumption is reduced, the calculation speed is increased, and further, the efficiency of the processing process of the neural network to be quantized is higher.

Based on the same concept, the embodiments of the present disclosure also provide a neural network processing method, which may be applied to a chip such as an AI chip, and the like, and the method includes: S201-S203, wherein:

s201, acquiring a target segmentation point corresponding to at least one to-be-quantized network processing layer in a to-be-quantized neural network;

s202, quantizing the feature data of the network processing layer to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized to obtain quantized feature data;

s203, determining an inference result of the neural network to be quantized based on the quantized feature data.

Here, a target segmentation point of at least one network processing layer to be quantized in the neural network to be quantized is obtained, which may be determined using the neural network quantization method described above. Such as target segment points that may include weighting features of the network processing layer, target segment points of the input features, and so on. And then, based on the corresponding target segmentation point of the network processing layer to be quantized, quantizing the feature data of the network processing layer to be quantized to obtain quantized feature data, for example, quantizing the weight feature data and the input feature data to obtain quantized weight feature data and quantized input feature data.

And determining an inference result of the neural network to be quantized according to the quantized feature data. For example, the image to be detected may be input to a neural network to be quantized, and each network processing layer in the neural network to be quantized detects the image to be detected, so as to obtain output characteristic data. Finally, a reasoning result can be determined based on the output characteristic data of the last network processing layer in the neural network to be quantized; alternatively, the output characteristic data of the last network processing layer may be determined as the inference result. For example, when the neural network to be quantized is a neural network for performing face recognition, the inference result may be a face recognition result.

Here, by obtaining a target segmentation point of the network processing layer to be quantized, which is determined according to the neural network quantization method described above, the target segmentation point is accurate, and then based on the target segmentation point, the quantization processing can be performed on the feature data of the network processing layer to be quantized more accurately, so as to obtain quantized feature data, and based on the quantized feature data, the inference result of the neural network to be quantized is determined more accurately, so that on the basis of ensuring the inference accuracy, the processing efficiency of the execution main body operating the neural network to be quantized is improved by reducing the data accuracy of the feature data, and the computational resources are reduced.

It will be understood by those of skill in the art that in the above method of the present embodiment, the order of writing the steps does not imply a strict order of execution and does not impose any limitations on the implementation, as the order of execution of the steps should be determined by their function and possibly inherent logic.

Based on the same concept, an embodiment of the present disclosure further provides a neural network quantization apparatus, as shown in fig. 3, which is an architecture schematic diagram of the neural network quantization apparatus provided in the embodiment of the present disclosure, and includes a first obtaining module 301 and a first determining module 302, specifically:

a first obtaining module 301, configured to obtain a neural network to be quantized;

a first determining module 302, configured to determine, for any network processing layer to be quantized in the neural network to be quantized, a target segment point corresponding to feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression; the quantization error expression is used for determining an error between the feature data and quantized data corresponding to the feature data; in order to quantize the operation feature data of the network processing layer in the neural network to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized in the inference process of the neural network to be quantized.

In one possible implementation, the first determining module 302, when determining the target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression, is configured to:

under the condition that the quantization error is smaller than the error threshold value, adding one to the current iteration times to obtain updated current iteration times; determining the quantization error as an updated error threshold, determining the current segmentation point as an updated historical segmentation point, returning to the step of determining the current segmentation point corresponding to the current iteration number based on the historical segmentation point and the total iteration number until the updated current iteration number is greater than the total iteration number;

In a possible implementation, the first determining module 302 is further configured to:

In a possible implementation, the first determining module 302, when determining the quantization error corresponding to the current segmentation point based on the current segmentation point, the feature data and the quantization error expression, is configured to:

determining a quantized feature value corresponding to each feature value in the feature data based on the current segmentation point and the set quantization expression;

In a possible implementation manner, the first determining module 302, when determining the quantized feature value corresponding to each feature value in the feature data based on the current segmentation point and the set quantized expression, is configured to:

In a possible implementation manner, in a case that the feature data includes input feature data and/or output feature data, before the determining, based on the feature data of the network processing layer to be quantized and a quantization error expression, a target segmentation point corresponding to the feature data of the network processing layer to be quantized, further includes: an extracting module 303, configured to:

and determining a target segmentation point corresponding to the network processing layer to be quantized based on the candidate segmentation points corresponding to the feature data of each calibration image.

Based on the same concept, an embodiment of the present disclosure further provides a neural network processing apparatus, as shown in fig. 4, which is an architecture schematic diagram of the neural network processing apparatus provided in the embodiment of the present disclosure, and includes a second obtaining module 401, a quantizing module 402, and a second determining module 403, specifically:

a second obtaining module 401, configured to obtain a target segmentation point corresponding to at least one to-be-quantized network processing layer in a to-be-quantized neural network;

a quantization module 402, configured to perform quantization processing on the feature data of the network processing layer to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized, so as to obtain quantized feature data;

a second determining module 403, configured to determine an inference result of the neural network to be quantized based on the quantized feature data.

In some embodiments, the functions of the apparatus provided in the embodiments of the present disclosure or the included templates may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, no further description is provided here.

Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 5, a schematic structural diagram of an electronic device provided in the embodiment of the present disclosure includes a processor 501, a memory 502, and a bus 503. The storage 502 is used for storing execution instructions and includes a memory 5021 and an external storage 5022; the memory 5021 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 501 and the data exchanged with the external storage 5022 such as a hard disk, the processor 501 exchanges data with the external storage 5022 through the memory 5021, and when the electronic device 500 operates, the processor 501 and the storage 502 communicate through the bus 503, so that the processor 501 executes the following instructions:

acquiring a neural network to be quantized;

for any network processing layer to be quantized in the neural network to be quantized, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data and a quantization error expression of the network processing layer to be quantized; the quantization error expression is used for determining the error between the characteristic data and the quantized data corresponding to the characteristic data; in order to quantize the operation feature data of the network processing layer in the neural network to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized in the inference process of the neural network to be quantized.

Alternatively, processor 501 is executing the following instructions:

The specific processing flow of the processor 501 may refer to the description of the above method embodiment, and is not described herein again.

In addition, the present disclosure also provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the neural network quantization method and the neural network processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the neural network quantization method and the neural network processing method in the foregoing method embodiments, which may be specifically referred to in the foregoing method embodiments and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

The above are only specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present disclosure, and shall cover the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A neural network quantization method, comprising:

acquiring a neural network to be quantized;

for any network processing layer to be quantized in the neural network to be quantized, determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression; the quantization error expression is used for determining the error between the characteristic data and the quantized data corresponding to the characteristic data; in order to carry out quantization processing on the neural network to be quantized based on the target segmentation point corresponding to the network processing layer to be quantized in the inference process of the neural network to be quantized.

2. The method according to claim 1, wherein the determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression comprises:

3. The method of claim 2, further comprising:

4. The method according to claim 2 or 3, wherein the determining the quantization error corresponding to the current segmentation point based on the current segmentation point, the feature data and the quantization error expression comprises:

5. The method according to claim 4, wherein the determining a quantized feature value corresponding to each feature value in the feature data based on the current segmentation point and the set quantization expression comprises:

determining a first segmentation interval and a second segmentation interval corresponding to the feature data based on the current segmentation point; wherein the first segmentation interval comprises: an interval between a negative value of the current segmentation point and a positive value of the current segmentation point; the second segment interval includes: an interval between a positive value of the current segmentation point and a positive value of the target characteristic value, and an interval between a negative value of the target characteristic value and a negative value of the current segmentation point;

6. The method according to any one of claims 1 to 5, wherein in a case that the feature data includes input feature data and/or output feature data, before the determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression, the method further comprises:

7. The method according to claim 6, wherein the determining a target segmentation point corresponding to the feature data of the network processing layer to be quantized based on the feature data of the network processing layer to be quantized and a quantization error expression comprises:

8. A neural network processing method, comprising:

9. An apparatus for neural network quantization, comprising:

the first determination module is used for determining a target segmentation point corresponding to the feature data of the to-be-quantized network processing layer according to the feature data and the quantization error expression of any to-be-quantized network processing layer in the to-be-quantized neural network; the quantization error expression is used for determining the error between the characteristic data and the quantized data corresponding to the characteristic data; so that in the inference process of the neural network to be quantized, the neural network to be quantized is quantized based on the target segmentation point corresponding to the network processing layer to be quantized.

10. A neural network processing apparatus, comprising:

and the second determination module is used for determining an inference result of the neural network to be quantified based on the quantified characteristic data.

11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the neural network quantization method of any one of claims 1-7; or to perform the steps of the neural network processing method of claim 8.

12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the neural network quantization method according to any one of claims 1 to 7; or to perform the steps of the neural network processing method of claim 8.