CN115906971A

CN115906971A - Neural network activation method and device, NPU, equipment and storage medium

Info

Publication number: CN115906971A
Application number: CN202211534860.4A
Authority: CN
Inventors: 罗杰; 姜坤
Original assignee: Zeku Technology Shanghai Corp Ltd
Current assignee: Zeku Technology Shanghai Corp Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-04

Abstract

The application discloses a neural network activation method and device, an NPU, equipment and a storage medium, and relates to the field of artificial intelligence. The method comprises the following steps: determining a target primary interval to which the input value belongs from a primary lookup table, wherein the primary lookup table comprises a corresponding relation between the primary interval and interval parameters; determining target conversion parameters of a target secondary interval to which the input values belong from a secondary lookup table based on target interval parameters of the target primary interval, wherein the secondary lookup table comprises corresponding relations between the secondary intervals and the conversion parameters, and the conversion parameters are obtained by converting fitting parameters of linear fitting of an activation function corresponding to the secondary intervals; and determining an activation value based on the target conversion parameter and the input value through a multiplexing quantization algorithm module, wherein the quantization algorithm module is used for carrying out data quantization and inverse quantization processing. The embodiment of the application multiplexes the quantitative calculation module to perform activation calculation, so that the improvement cost of the activation method is reduced.

Description

Neural network activation method and device, NPU, equipment and storage medium

Technical Field

The embodiment of the application relates to the field of artificial intelligence, in particular to a neural network activation method and device, an NPU, equipment and a storage medium.

Background

With the development of deep learning, the neural network is widely applied to various fields, the model scale is developed in a wider and deeper direction, and in order to reduce computing resources and occupation and widen the application scene of the neural network, the related technology reduces the size of the model and saves the storage space by quantifying while keeping the model structure.

In order to complete Neural network calculation by using an NPU (Neural-network Processing Units, neural network processor), a Look-Up Table (LUT) is mostly used to simplify the calculation process in the related art, and when activated calculation is performed by using a self-optimized LUT, the NPU needs to be adaptively recoded in the related art, which causes resource consumption.

Disclosure of Invention

The embodiment of the application provides a neural network activation method and device, an NPU, equipment and a storage medium, realizes multiplexing of a quantitative calculation module, and simplifies an improvement process while improving an activation calculation effect. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for activating a neural network, where the method includes:

determining a target primary interval to which an input value belongs from a primary lookup table, wherein the primary lookup table comprises a corresponding relation between the primary interval and interval parameters, the primary interval is obtained by dividing an input value range, the input value is a fixed point number, and the input value range is a fixed point number range;

determining a target conversion parameter of a target secondary interval to which the input value belongs from a secondary lookup table based on a target interval parameter of the target primary interval, wherein the target secondary interval belongs to the target primary interval, each primary interval is divided into at least one secondary interval, the secondary lookup table comprises a corresponding relation between the secondary interval and the conversion parameter, and the conversion parameter is obtained by converting a fitting parameter of linear fitting of an activation function corresponding to the secondary interval;

and determining an activation value corresponding to the input value based on the target conversion parameter and the input value through a multiplexing quantization algorithm module, wherein the activation value is a fixed point number, and the quantization algorithm module is used for data quantization and inverse quantization processing.

In another aspect, an embodiment of the present application provides an apparatus for activating a neural network, where the apparatus includes:

the device comprises a first lookup table unit, a second lookup table unit and a calculation unit, wherein the calculation unit is connected with the first lookup table unit and the second lookup table unit in a ratio:

the first lookup table unit is used for storing a first-level lookup table, the first-level lookup table comprises a corresponding relation between a first-level interval and an interval parameter, the first-level interval is obtained by dividing an input value range, the input value is a fixed point number, and the input value range is a fixed point number range;

the second lookup table unit is configured to store a second lookup table, where the second lookup table includes a correspondence between the secondary interval and a conversion parameter, and the conversion parameter is obtained by converting a fitting parameter of linear fitting of an activation function corresponding to the secondary interval;

the calculation unit is used for determining a target primary interval to which the input value belongs from the primary lookup table; determining a target conversion parameter of a target secondary interval to which the input value belongs from a secondary lookup table based on a target interval parameter of the target primary interval, wherein the target secondary interval belongs to the target primary interval; acquiring a target conversion parameter corresponding to the target secondary interval from the secondary lookup table; and determining an activation value corresponding to the input value based on the target conversion parameter and the input value through a multiplexing quantization algorithm module, wherein the activation value is a fixed point number, and the quantization algorithm module is used for carrying out data quantization and inverse quantization processing.

In another aspect, embodiments of the present application provide an NPU, which includes programmable logic circuits and/or program instructions, and when the NPU is operated, the NPU is configured to implement the method for activating a neural network according to the above aspect.

In another aspect, an embodiment of the present application provides an NPU, which includes the activation device of the neural network described in the above aspect.

In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one program, and the at least one program is loaded by the processor and executed to implement the method for activating a neural network according to the foregoing aspect.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, in which at least one program is stored, and the at least one program is loaded and executed by a processor to implement the method for activating a neural network according to the above aspect.

In another aspect, embodiments of the present application provide a computer program product including computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the neural network activation method described in the above aspect.

In the embodiment of the application, the computer equipment determines a target first-level interval based on an input value, reads target interval parameters in the target first-level interval, determines a target second-level interval based on the target interval parameters, and further determines target conversion parameters stored in the target second-level interval. Based on target conversion parameters and input values read from a newly constructed secondary lookup table, the computer equipment performs activation calculation through the multiplexing quantization algorithm module to obtain an activation value, wherein the target conversion parameters can complete activation calculation according to inherent calculation logic of the quantization calculation module, so that the multiplexing quantization calculation module completes activation calculation based on an optimized LUT (look-up table), and optimization cost brought by activation calculation through the newly constructed LUT is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a block diagram of a computer device provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a flow chart of a method of activating a neural network provided by an exemplary embodiment of the present application;

FIG. 3 illustrates a schematic diagram of interval partitioning provided by an exemplary embodiment of the present application;

FIG. 4 illustrates a schematic diagram of a quantization computation module provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a flow chart of activation calculation provided by an exemplary embodiment of the present application;

FIG. 6 is a diagram illustrating a multiplexing quantization computation module provided in an exemplary embodiment of the present application;

FIG. 7 is a diagram illustrating a table lookup process provided by an exemplary embodiment of the present application;

FIG. 8 illustrates a flow chart of a two-level lookup table lookup method provided by an exemplary embodiment of the present application;

fig. 9 shows a block diagram of an activation apparatus of a neural network according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

For convenience of understanding, terms referred to in the embodiments of the present application will be described below.

Model quantification: is a method of compressing a model. In order to meet the requirements of various AI applications on detection accuracy, the number of the width, the number of layers, the depth, various parameters and the like of a deep neural network structure rapidly rises, so that a deep learning model occupies a larger storage space, needs longer reasoning time delay and is not beneficial to industrial deployment. The current models are all run on four types of chips, such as a CPU (Central Processing Unit), a GPU (Graphic Processing Unit), an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), and the like, and the computational power of the chips is limited compared to the deep learning model. For chips on edge devices, there are many limitations in storage, memory, power consumption, and scalability, and reasoning efficiency is especially important.

Under the contexts of computer vision, deep learning and the like, the model refers in particular to a convolutional neural network. Quantization, which is a process of approximating a continuous value of a signal to a finite number of discrete values, is a method of information compression. The model is composed of weight and bias, wherein the two types of data are stored in a float 32 data type, the float 32 data is used as floating point number and occupies 32bit during storage, the int 8 fixed point number only occupies 8bit during storage, and the model quantization is the model compression method for representing the floating point number by the fixed point number to operate.

LUT Table (Look-Up-Table, showing Look-Up tables): essentially a RAM (Random Access Memory). The LUT table, which replaces the logic and or gates needed to obtain the data with two tables similar to the truth table, holds all the results of the input variables and the output variables after passing through the logic gates. After data is written into RAM in advance, the input of signal is equivalent to inputting an address to make table look-up, and when the correspondent address is found, said value can be used as result to output.

Least squares Method (Least Square Method): the method is a common method for solving the unconstrained optimization problem, and is also called a least squares method. In the embodiment of the application, the least square method is a calculation method for performing linear fitting on discrete values to obtain an optimal fitting straight line. The least squares method finds the best match function for the data by minimizing the sum of the squares of the errors. The embodiment of the application performs linear fitting through a least square method to obtain a linear expression with a small error.

Referring to fig. 1, a block diagram of a computer device according to an exemplary embodiment of the present application is shown. Computer device 100 may include one or more of the following components: a processor 110, a memory 120.

The processor 110 is integrated with an NPU (Neural Network Processing Unit) for performing Neural Network Processing, executing an activation method of a Neural Network, and implementing an Artificial Intelligence (AI) function. Processor 110 may include one or more processing cores. The processor 110 interfaces with various components throughout the computer device 100 using various interfaces and lines to perform various functions of the computer device 100 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing contents required to be displayed by the touch display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a single processor.

The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the various method embodiments described below, and the like; the storage data area may store data (such as audio data, a phonebook) created according to the use of the computer device 100, and the like.

In addition, those skilled in the art will appreciate that the configuration of the computer device 100 illustrated in the above-described figures does not constitute a limitation of the terminal, which may include more or less components than those illustrated, or some components may be combined, or a different arrangement of components. For example, the computer device 100 may further include an input unit, an audio circuit, a Wi-Fi module, a power supply, a bluetooth module, and other components, which are not described herein again.

Referring to fig. 2, a flowchart of an activation method of a neural network provided in an exemplary embodiment of the present application is shown. The embodiment is described by taking the method as an example for computer equipment, and the method comprises the following steps:

step 201, determining a target first-level interval to which an input value belongs from a first-level lookup table, wherein the first-level lookup table comprises a corresponding relation between the first-level interval and an interval parameter, the first-level interval is obtained by dividing an input value range, the input value is a fixed point number, and the input value range is a fixed point number range.

Under the condition of obtaining the input value, the computer device firstly searches a primary lookup table based on the input value, a mapping relation stored in the primary lookup table is a corresponding relation between a primary interval and an interval parameter, and the computer device needs to determine a target primary interval based on the input value and further reads the target interval parameter.

In the embodiment of the present application, the LUT table is used to simplify the active calculation process of the quantization model, and therefore, in the calculation process, the input value is a fixed-point number, and the corresponding input value range is a fixed-point number range.

In an exemplary example, as shown in fig. 3, the activation function interval corresponding to the primary interval is obtained by uniformly dividing the activation function, and the division granularity is a primary division granularity.

It should be noted that, in the embodiment of the present application, the activation function applied to the neural network may be a sigmoid function f (x) = 1/(1+e) ^-x ) The function may be a nonlinear function such as Tanh function f (x) = Tanh (x), or a ReLU function f (x) = max (0,x), and the present application is not limited thereto.

Step 202, based on the target interval parameters of the target primary interval, determining target conversion parameters of a target secondary interval to which the input values belong from a secondary lookup table, wherein the target secondary interval belongs to the target primary interval, each primary interval is divided into at least one secondary interval, the secondary lookup table comprises the corresponding relation between the secondary interval and the conversion parameters, and the conversion parameters are obtained by converting fitting parameters of linear fitting of a corresponding activation function of the secondary interval.

Under the condition of determining the target primary interval, reading corresponding target interval parameters in the primary lookup table, and based on the target interval parameters, calculating and determining a target secondary interval corresponding to the input value by the computer equipment, and further determining target conversion parameters stored in the obtained target secondary interval. In a possible implementation manner, as shown in fig. 3, the activation function interval corresponding to the secondary interval is obtained by non-uniformly dividing the activation function, specifically, in the same primary interval, the activation function is uniformly divided to obtain the secondary interval, and the number of the secondary intervals is not equal between different primary intervals.

The second-level lookup table stores a corresponding relationship between a second-level interval and a conversion parameter, wherein the conversion parameter is obtained by converting a fitting parameter, and when an activation function curve corresponding to the second-level interval is subjected to linear fitting, a fitting straight line can be represented as:

y＝A×x+C (1)

substituting the interval starting value of the secondary interval into formula 1 to obtain:

Y _n ＝A×X _n +C (2)

wherein (X) _n ,Y _n ) For the value of the interval starting point in the nth secondary interval, parameters in the fitting straight line can be simplified according to the formula 1 and the formula 2, and the fitting straight line can be represented by the following formula:

y＝Ax+Y _n -AX _n (3)

further, in order to reduce the size of the neural network model, so that the neural network can still be applied in an application scenario of marginality with limited computational resources, in the embodiment of the present application, the neural network is calculated in a model quantization manner, and accordingly, in the process of activation through linear fitting, a fitted straight line represented by formula 3 needs to be quantized, and a formula based on linear quantization is shown as the following formula:

Q＝R/S+Z

wherein, R is floating point number, namely real data before quantization; q is a fixed point number obtained by quantizing real floating point data; s is a quantization scale when floating point numbers are quantized; and Z is a quantization zero point and represents the data offset introduced by the quantization process.

Substituting the linear quantization formula into formula 3 can obtain a quantized fitting straight line, as follows:

(Q _y -Z _y )S _y ＝AS _x (Q _x -Z _x )+Y _n -AX _n (4)

the mathematical transformation of equation 4 yields:

equation 5 is the activation value Q _y The two parameters in the formula are fitting parameters. Based on slope parameter

Intercept parameter->

The above formula can be simplified to obtain a fitting linear expression:

Q _y ＝αQ _x +c (6)

alternatively, the fitting parameters may be determined by a least squares method.

Further, based on the conversion parameters corresponding to the secondary intervals stored in the secondary lookup table, the target secondary index can be used as a lookup address to determine the target secondary interval when the target secondary index is determined, and the target conversion parameters stored in the target secondary interval are read, so that the activation value can be further calculated based on the target conversion parameters.

And 203, determining an activation value corresponding to the input value based on the target conversion parameter and the input value through a multiplexing quantization algorithm module, wherein the activation value is a fixed point number, and the quantization algorithm module is used for data quantization and inverse quantization processing.

And under the condition that the target conversion parameters stored in the secondary lookup table are read by table lookup, completing activation calculation through the fitting linear expression (formula 5) to obtain an activation value. In the embodiment of the application, in order to reduce the optimization cost of the LUT table, the fitted straight line expression is subjected to mathematical transformation, so that the calculation logic of the fitted straight line expression conforms to the inherent operation logic of the quantization algorithm module, that is, the computer device can complete activation calculation through the multiplexing quantization calculation module.

In summary, the computer device determines the target first-level interval based on the input value, reads the target interval parameter in the target first-level interval, determines the target second-level interval based on the target interval parameter, and further reads the target conversion parameter stored in the target second-level interval. Based on the target conversion parameters and the input values read from the newly-constructed secondary lookup table, the computer equipment performs activation calculation through the multiplexing quantization algorithm module to obtain the activation values, wherein the target conversion parameters can complete activation calculation according to the inherent calculation logic of the quantization calculation module, so that the multiplexing quantization calculation module can complete activation calculation based on the optimized LUT, and the optimization cost brought by using the newly-constructed LUT for activation calculation is reduced.

In the embodiment of the present application, a computer device completes activation calculation by multiplexing a quantization algorithm module, and in a possible implementation, based on that the quantization algorithm module is used for performing data quantization and inverse quantization processing, the present application takes linear quantization as an example for description, and the quantization algorithm module includes the following calculation procedures:

in performing data inverse quantization, the linear inverse quantization calculation can be represented by the following formula:

R＝round(S(Q-Z))

wherein, R is corresponding floating point data obtained by inverse quantization of the fixed point data; round () indicates a function that rounds a value; s is a scaling factor, which represents a scaling ratio between the fixed point number and the floating point number in the inverse quantization calculation, and optionally, the scaling factor S may be calculated by the following formula:

S＝(R _max -R _min )/(Q _max -Q _min )

wherein R is _max Maximum value, R, of floating-point numbers obtained for inverse quantization calculation _min Minimum value, Q, of floating-point numbers calculated for inverse quantization _max For maximum value of fixed-point number before inverse quantization, Q _min For the minimum value of the fixed-point numbers before inverse quantization, that is, S indicates the ratio between the floating-point number numerical range and the fixed-point number numerical range before and after quantization calculation, schematically, in the quantization calculation for quantizing a floating-point number to an 8-bit fixed-point number, the scaling factor may be expressed as: s = (R) _max -R _min ) (255); z is a quantization zero point, which represents a calculation error introduced in the quantization calculation process, and optionally, the quantization zero point Z may be calculated by the following formula: z = round (Q) _max -R _max S). The formula for calculating S or Z is only for illustrative purposes, and the present application is not limited thereto.

In the above linear inverse quantization formula, the following calculation logic exists: firstly, carrying out subtraction calculation on Q-Z; then, multiplying the subtraction calculation result and S; further rounding the whole operation result, optionally, under the condition that the processor does not support floating point calculation, in order to realize full-quantization reasoning, the data in the early stage may be amplified by 2 ⁿ The number of fixed points multiplied is converted and operated, and then when the inverse quantization calculation is carried out, right shift calculation is required to be introduced to restore data.

Correspondingly, as shown in fig. 4, the quantization calculation module includes, in addition to the input module 401 and the output module 408, the following calculation modules in sequence: a subtraction calculation module 402, a multiplication calculation module 403, a right shift calculation module 404, and a rounding calculation module 405. The quantization calculation module performs Q-Z operation by using the subtraction calculation module 402, ZP1 indicates a quantization zero point Z in the above linear inverse quantization formula (formula 4), and further performs S (Q-Z) operation by using the multiplication calculation module 403, and further performs round (S (Q-Z)) operation by using the right shift calculation module 404 to restore data. It should be noted that, limited to the hardware-specific data calculation method, ZP1 in the calculation module needs to be an integer.

The quantization computation module simultaneously undertakes the task of quantization computation, and during the process of data quantization, the linear quantization computation can be expressed by the following formula:

Q＝round(R/S)+Z

the linear inverse quantization formula is the same as the linear inverse quantization formula, wherein Q is fixed point number data obtained through quantization calculation, R is floating point number data before quantization calculation, S is a scaling factor, and Z is a quantization zero point.

In this quantization formula, there is the following calculation logic: firstly, performing R/S division operation; then based on the fact that the division operation result may not be an integer, rounding up and rounding up the R/S; and then the rounding operation result and the quantization zero point Z are added. Correspondingly, as shown in fig. 4, the quantization calculation module sequentially includes the following calculation modules: a right shift calculation module 404, a rounding calculation module 405, and an addition calculation module 406. The quantization calculation module performs R/S operation by using the right shift calculation module 404, and then performs round (R/S) operation by using the rounding calculation module 405, and performs round (R/S) + Z operation by using the addition calculation module 406. It should be noted that, in the quantization calculation process, in order to avoid the occurrence of numerical value overflow, pruning calculation needs to be performed on the quantization calculation result, and the quantization calculation module further includes a pruning calculation module 407. It should be noted that, limited to the hardware-specific data calculation method, ZP2 in the calculation module is required to be an integer.

Optionally, the quantization calculation module corresponds to hardware in the NPU, where the subtraction calculation module 402 may be a subtractor, the multiplication calculation module 403 may be a multiplier, the right shift calculation module 404 may be implemented by using a shifter, and the addition calculation module 406 may be an adder.

In order to multiplex the quantization algorithm module for performing the LUT table-based activation function calculation, the activation calculation formula needs to be mathematically transformed, so that the calculation logic of the activation calculation formula is identical to the inherent logic of the quantization calculation module.

Based on the quantization calculation module, the first calculation module is an addition calculation module, and the first step in the corresponding activation calculation should also be an addition operation, Q based on equation 6 _y ＝αQ _x + c, α can be proposed as a common factor, and then let equation 6 convert to:

Q _y ＝α(Q _x +c/α) (7)

converted, and added to Q in equation 7 above _x The + C/alpha may be calculated by an addition calculation module, and accordingly, C/alpha corresponds to ZP1. Meanwhile, the multiplication operation in the formula 7 may further be performed by using a multiplication calculation module in the quantization calculation module. Wherein, because alpha represents a slope parameter, when fitting calculation is carried out at a smooth part of an activation function curve, the value of alpha can be infinitesimal, so the value of c/alpha can be infinitesimal, and when the value of alpha is less than 1/4*Q to avoid numerical overflow _range When the value of alpha is set equal to 1/4*Q _range . Wherein Q is _range The difference between the maximum value and the minimum value that can be expressed in the fixed-point number is expressed.

It should be noted that, limited by the hardware-specific data calculation method, ZP1 needs to be an integer, so that the parameter needs to be rounded when stored, and is guaranteed to be an integer. Further, in connection with the actual data storage form, the above equation 7 can be expressed as:

Q _y ＝α(Q _x +round(c/α)) (8)

based on the above formula 8, round (c/α) is a parameter required in the activation calculation, and accordingly, when the activation calculation is performed by using the LUT table, round (c/α) should be a first conversion parameter stored in the secondary lookup table, where the first conversion parameter is obtained by calculation conversion from the slope parameter α and the intercept parameter c in the fitting parameters.

It should be noted that, in the above formula 8, α is also a parameter required in the activation calculation, and is limited to only being able to store an integer in the LUT table, and when the slope parameter is stored in the secondary lookup table, an amplification result obtained by amplifying the slope parameter, that is, the second conversion parameter Mul obtained through the calculation conversion, needs to be stored. Accordingly, in connection with the actual stored data, equation 8 above can be expressed as:

Q _y ＝Mul(Q _x +round(c/α))＞＞N (9)

wherein the second conversion parameter is 2 of the slope parameter ^N Data reduction by shift calculation to the right, i.e. α = Mul/2 ^N In the shift-to-right calculation, the result of the shift-to-right is a result of shifting N bits to the right, where N is a positive integer.

Based on formula conversion, when the activation calculation is completed through the LUT table, the secondary lookup table comprises a secondary interval, a corresponding relation between a first conversion parameter and a second conversion parameter, wherein the first conversion parameter is an integer result of the intercept slope ratio, and the second conversion parameter is an amplified result of the slope parameter.

Based on equation 9, which is an active calculation equation converted from the fitted straight line expression (equation 6) and conforming to the inherent calculation logic of the quantization calculation module, the two-stage lookup table stores a first conversion parameter and a second conversion parameter when performing active calculation based on the LUT table. Referring to fig. 5, a flowchart of activation calculation provided by an exemplary embodiment of the present application is shown. The activation calculation may include the steps of:

step 501, determining the sum of the input value and the first target conversion parameter as a first intermediate value.

The first target conversion parameter is a first conversion parameter stored in the target secondary interval, and based on the first conversion parameter, the first target conversion parameter may be represented as round (c/α), and a first intermediate value may be obtained as: q _x + round (c/α). That is, as shown in fig. 6, in case that the first target conversion parameter is read by looking up the secondary lookup table,corresponding to the quantization calculation module 610, the computer device first performs a determine first intermediate value calculation 621 with the subtraction calculation module 611 based on the activation calculation formula (formula 9) 620.

Step 502, determine the product of the second target conversion parameter and the first intermediate value as a second intermediate value.

The second conversion target parameter is a second conversion parameter stored in the target secondary interval, and may be expressed as Mul based on the second conversion parameter, so that a second intermediate value may be obtained as Mul (Q) _x + round (c/α)). That is, as shown in fig. 6, in the case where the second target conversion parameter is read by looking up the secondary lookup table, corresponding to the quantization calculation module 610, the computer device performs the determination second intermediate value calculation 622 using the multiplication calculation module 612 based on the activation calculation formula (formula 9) 620 after completing the addition operation.

Step 503, determining the result of the right shift of the second intermediate value as the activation value, wherein the second conversion parameter is 2 of the slope parameter ^N The result of the right shift is the result of shifting N bits to the right, N being a positive integer.

2 based on the target second conversion parameter being a slope parameter ^N And when the calculation is activated, data restoration processing, namely right shift calculation, needs to be correspondingly carried out. Accordingly, as shown in fig. 6, the right shift calculation 623 can be performed by the right shift calculation module 613 in the quantization calculation module 610.

In the above formula 9, in order to ensure that ZP1 is an integer, the second-level lookup table stores a first conversion parameter obtained by rounding, considering that a certain error may be introduced in the operation process of the rounding calculation, and under the condition that the storage space allows, the second-level lookup table may include a corresponding relationship between a second-level interval, the first conversion parameter, a second conversion parameter, and a third conversion parameter, where the third conversion parameter is an integer result of a product of a slope parameter and the error parameter, and the error parameter is a difference value between an intercept slope ratio and the first conversion parameter.

Wherein the third conversion parameter indicates an error introduced by rounding the ratio of the intercept parameter and the slope parameter, which may be expressed as Δ = c/α -round (c/α). In order to ensure that the calculation logic is matched with the inherent calculation logic of the quantization calculation module while reducing the error, the addition operation can be introduced on the basis of the formula 9, and the error reduction is completed by using the addition calculation module in the quantization calculation module. That is, α Δ needs to be added based on equation 9.

It should be noted that, when the addition calculation module in the quantization calculation module is used to add α Δ, α Δ corresponds to ZP2, and because the method is limited by the inherent operation manner of hardware, ZP2 needs to be an integer, and then a third conversion parameter needs to be obtained by rounding: round (α Δ). That is, the activation calculation mode can be expressed by the following formula:

Q _y ＝Mul(Q _x +round(c/α))＞＞N+round(αΔ) (10)

accordingly, performing the activation calculation based on equation 10 may include the steps of:

1. the sum of the input value and the first target conversion parameter is determined as a first intermediate value.

This step is the same as step 501, and is not described herein again.

2. The product of the second target conversion parameter and the first intermediate value is determined as a second intermediate value.

This step is the same as step 502 and will not be described herein again.

3. Determining the result of the right shift of the second intermediate value as a third intermediate value, wherein the second conversion parameter is 2 of the slope parameter ^N The result of the right shift is the result of shifting N bits to the right, N being a positive integer.

Wherein the third intermediate value may be expressed as: mul (Q) _x + round (c/α)) > N. That is, under the condition that the ghost cry of the amplification factor N is read through the secondary lookup table, corresponding to the quantization calculation module, the computer device performs right shift calculation using the right shift calculation module based on the activated calculation formula (formula 10) after completing the multiplication operation.

It should be noted that, in the same two-stage lookup table, each second conversion parameter is obtained by performing amplification processing by using the same amplification factor, that is, the value of N is unique, and further, in a possible implementation manner, only one amplification factor N is stored in the two-stage lookup table.

4. The sum of the third intermediate value and the third target transition parameter is determined as the activation value.

The third target conversion parameter is a third conversion parameter stored in the target secondary interval, and the third conversion parameter may be represented as round (α Δ). As shown in fig. 6, in the case where the third target conversion parameter is read by looking up the secondary lookup table, corresponding to the quantization calculation module 610, the computer device performs the above-described addition calculation 631 using the addition calculation module 614 based on the activated calculation formula (formula 10) 630 after completing the right shift calculation.

In summary, under the condition that the storage space allows, a third conversion parameter is introduced into the activation calculation so as to reduce the error caused by formula transformation, and improve the accuracy of the activation calculation.

In the process of performing activation calculation based on the lookup table, when an input value is obtained, a first-level interval in a first-level lookup table corresponding to the input value is determined first, and each entry in the first-level lookup table corresponds to one first-level interval. In a possible implementation manner, the first-level intervals are obtained by uniformly dividing the range of the input value based on the first-level division granularity, that is, the number of the first-level intervals can be calculated by the following formula:

S _num ＝Q _range /ΔS

wherein S is _nun The number of the first-level intervals in the first-level lookup table is; q _range For a range of input values, i.e. for a range of values of the fixed point number after quantization, can be expressed as Q _range ＝Q _max -Q _min (ii) a Δ S is the primary partition particle size. Since the primary interval corresponding to the input value does not need to be determined with high accuracy in the primary lookup table, in a possible implementation manner, the input value range in the primary interval may take the form of 8-bit data, that is, there is a maximum input value range Q _range =255, indicating that the legal input value may be any value of 0 to 255. Illustratively, in the case where the input value ranges from 0 to 255 and the first-order division granularity is 8Next, the first-level lookup table includes 32 first-level sections.

Further, a target primary index corresponding to the input value is determined based on the input value, the minimum value in the input value range, and the primary partition granularity. In one possible implementation, the target primary index may be calculated by the following formula:

wherein Q is _x Is an input value, Q _min For the minimum value in the range of input values, floor () represents a floor function.

Illustratively, as shown in fig. 7, in the case where the input value is 123, the minimum value in the input value range is 0, and the one-level division granularity is 8, the target one-level index is 15 according to equation 11.

Further, based on the one-to-one correspondence between the target primary index and each primary interval, under the condition that the target primary index is obtained through calculation, the primary interval corresponding to the target primary index in the primary lookup table is determined as the target primary interval.

The LUT-based table is essentially a process of reading the content of a corresponding entry through a target primary index, i.e., a process of reading data information stored at a memory address based on the address. In the embodiment of the present application, the target primary index is used as an address, and the interval parameter stored in the address is a target interval parameter. In one possible implementation, the target interval parameters include a starting secondary index of a secondary interval in the target primary interval, a target secondary interval number of the target primary interval, and an interval starting value of the target primary interval. The secondary intervals in each primary interval are uniformly divided, that is, the secondary intervals are obtained by uniformly dividing curves in each primary interval.

In a possible implementation manner, the target secondary interval can be determined based on the target interval parameter, and as to a manner of determining the target secondary interval, please refer to fig. 8, which shows a flowchart of determining the target secondary interval according to an exemplary embodiment of the present application.

Step 801, determining the secondary partition granularity of the secondary interval in the target primary interval based on the primary partition granularity and the number of the target secondary intervals.

In the embodiment of the application, in order to ensure the accuracy of the activation calculation while saving the storage space, the second-level interval needs to select an appropriate partition granularity, and correspondingly, a smaller partition granularity is adopted for a curve interval with a larger slope change of the activation function curve to further determine the fitting parameter, that is, more second-level intervals should be included in a first-level interval with a larger slope change of the curve, so that the curve segment in each second-level interval is as close to a straight line as possible. In the two-level lookup table, the value range included in each two-level section is non-uniform, but it should be noted that the two-level sections in each one-level section are uniformly divided. Since the secondary partition granularity is different in the primary interval according to the different slope change degrees of the curves in the primary interval, the secondary partition granularity needs to be determined first when the target secondary interval is determined based on the primary interval.

Each entry in the first-level lookup table at least stores a first-level index and the number of corresponding second-level intervals in the first-level intervals. The calculation method for determining the secondary partition granularity by the primary partition granularity and the number of the secondary intervals can be represented by the following formula:

ΔP＝ΔS/P (12)

wherein, Δ P is the secondary partition granularity, Δ S is the primary partition granularity, and P is the number of secondary intervals. It should be noted that the activation curve is obtained by uniformly dividing the activation curve based on the first-level interval, and for the same first-level lookup table, the first-level division granularity Δ S is a fixed value.

Illustratively, as shown in fig. 7, in the case that the primary partition granularity is 8, when P is read as 3 in the primary lookup table, the secondary partition granularity in the target primary interval is 8/3, which can be calculated by equation 12.

In the above process, the secondary partition granularity is determined based on the number of the secondary intervals stored in the read target primary interval, and in order to improve the accuracy of the activation value, the linear fitting effect needs to be improved, that is, a smaller secondary partition granularity needs to be set in the primary interval with a large change in the slope of the curve, and the number of the secondary intervals is increased to perform linear fitting. In one possible embodiment, the number of secondary intervals in each primary interval is positively correlated with the rate of change of the second derivative of the activation function in the primary interval.

Correspondingly, when a secondary lookup table is constructed, i points on a corresponding activation function curve are respectively determined in each primary interval to serve as reference points, second derivatives of the reference points are calculated, and then the i values calculated in each primary interval are summed, wherein the sum of the second derivatives in each primary interval represents the change condition of the slope of the curve in the primary interval, namely the slope of the activation curve in the primary interval with the larger sum of the second derivatives changes faster. Based on the determined number of the second-level intervals, determining the number of the second-level intervals in each first-level interval according to the sum of second-order derivatives in each first-level interval in proportion, so as to ensure that more second-level intervals are correspondingly divided in the first-level intervals with larger curve slope changes, and determine that the second-level division granularity is smaller; correspondingly, fewer secondary intervals are divided in the primary interval with smaller slope change of the curve, so that the storage space of a secondary lookup table is saved, and a better fitting effect can be obtained while the number of the secondary intervals is controlled.

It should be noted that, limited to the hardware calculation method, the total number of the second-level intervals needs to be a power of 2.

Illustratively, when a secondary lookup table is constructed, the activation curve is divided into 32 primary intervals, 10 points are respectively determined in each primary interval, second derivatives of the points are calculated for summation, and when the total number of the secondary intervals is 128, the number of the secondary intervals in each primary interval is determined according to a certain proportion according to the sum of the second derivatives in each primary interval. The specific manner of determining the number of the second-stage intervals in proportion may be to multiply the total number of the second-stage intervals by the proportion and perform rounding, or to perform sorting according to the sum of the second derivatives and determine the number of the second-stage intervals in predefined proportion, and the like, and the above manner is only used for illustration, and the present application does not limit this.

Step 802, determining an index offset based on the input value, the interval start value of the target primary interval, and the secondary partition granularity.

Each entry in the first-level lookup table at least stores a first-level index, an interval starting value, a starting second-level index and a second-level interval number. The interval starting value indicates the starting value of the input value range in the target primary interval, and the secondary division granularity indicates the division granularity applicable to further dividing the secondary interval in the target primary interval. The calculation process for determining the index offset based on the input value, the interval start value, and the secondary partition granularity can be represented by the following formula:

where Δ E represents the index offset, Q _x Is an input value, Q ₀ The interval starting value of the target primary interval is shown, and the delta P is the secondary division granularity in the target primary interval. floor () then represents the operation of the floor function.

Illustratively, as shown in fig. 7, in the case where the input value is 123, the primary section corresponding to the target primary index of 15 is determined as the target primary section, and Q is read ₀ Is 120, P is 3, based on the primary partition granularity being 8, the calculated secondary partition granularity delta P is 8/3, calculated by formula 13, the index offset delta E is 1

Step 803, determine the target secondary index based on the index offset and the starting secondary index.

The index offset represents the deviation between the initial secondary index stored in the target primary interval and the target secondary index corresponding to the input value. Further, the process of calculating the determined target secondary index can be represented by the following calculation formula:

E＝ΔE+E ₀ (14)

wherein E is a target secondary index, E ₀ Is the initial secondary index and is read from the target primary interval, and Delta E is the index offset and the calculation mode is synchronousStep 402.

Illustratively, as shown in fig. 7, a primary index, an interval start value, a start secondary index, and a number of secondary intervals are stored in each primary interval of the primary lookup table, and in a case that an input value is 123, for a primary lookup table with a primary partition granularity of 8, a target primary index is 15, and is further read in a corresponding target primary interval, the target interval start value is 120, the start secondary index is 60, the number of secondary intervals is 3, and further an index offset is calculated to be 1, and is calculated by formula 14, and the target secondary index is 61.

And step 804, determining a secondary interval corresponding to the target secondary index in the secondary lookup table as a target secondary interval.

Based on the one-to-one correspondence between the secondary indexes and the secondary intervals, and the mapping relationship between the secondary indexes and the secondary intervals is stored in the secondary lookup table, further, under the condition that the target secondary intervals are determined according to the target interval parameters, the secondary intervals corresponding to the target secondary indexes in the secondary lookup table can be determined to be the target secondary intervals. Based on the nature of the secondary lookup table, the secondary lookup table is also data stored in a Random Access Memory (RAM), and the corresponding target secondary index is a lookup address, so that the computer device can query the secondary lookup table to determine a target secondary interval indicated by the target secondary index when determining the lookup address.

Step 805, obtaining a target conversion parameter corresponding to the target secondary interval from the secondary lookup table.

The secondary lookup table stores conversion parameters in a table form, and the target secondary interval is used as a target interval in the secondary lookup table, wherein the target conversion parameters used for determining the activation value are stored. Furthermore, the computer device can read the target conversion parameter in the target secondary interval.

In summary, in the embodiment of the present application, the computer device searches the primary lookup table with the same partition granularity based on the input value, and first determines the corresponding target primary interval in the primary lookup table, so as to determine the target secondary interval in the secondary lookup table with different partition granularities based on the target interval parameter stored in the target primary interval, and assist the search process. Under the condition that the secondary interval is the non-uniform interval, the target secondary interval can still be quickly determined without additional hardware such as a comparator, the storage space of the secondary interval is saved, the power consumption of equipment is saved, and the searching and reading process of the non-uniform lookup table is realized.

Referring to fig. 9, a block diagram of an activation apparatus for a neural network according to an exemplary embodiment of the present application is shown, the apparatus including:

the device comprises a first lookup table unit 901, a second lookup table unit 902 and a calculating unit 903, wherein the calculating unit 903 is respectively connected with the first lookup table unit 901 and the second lookup table unit 902. It should be noted that the first lookup table unit 901, the second lookup table unit 902, and the calculation unit 903 are all hardware units in an NPU.

The first lookup table unit 901 is configured to store a first-level lookup table, where the first-level lookup table includes a corresponding relationship between a first-level interval and an interval parameter, the first-level interval is obtained by dividing an input value range, the input value is a fixed-point number, and the input value range is a fixed-point number range;

the second lookup table unit 902 is configured to store a secondary lookup table, where the secondary lookup table includes a correspondence between the secondary interval and a conversion parameter, and the conversion parameter is obtained by converting a fitting parameter of linear fitting of an activation function corresponding to the secondary interval;

the calculating unit 903 is configured to determine a target primary interval to which the input value belongs from the primary lookup table; determining a target conversion parameter of a target secondary interval to which the input value belongs from a secondary lookup table based on a target interval parameter of the target primary interval, wherein the target secondary interval belongs to the target primary interval; acquiring a target conversion parameter corresponding to the target secondary interval from the secondary lookup table; and determining an activation value corresponding to the input value based on the target conversion parameter and the input value through a multiplexing quantization algorithm module, wherein the activation value is a fixed point number, and the quantization algorithm module is used for carrying out data quantization and inverse quantization processing.

Optionally, the second lookup table unit 902 is further configured to:

and storing the fitting parameters, wherein the fitting parameters comprise a slope parameter and an intercept parameter, the secondary lookup table comprises the corresponding relation among the secondary interval, a first conversion parameter and a second conversion parameter, the first conversion parameter is the rounding result of the intercept slope ratio, and the second conversion parameter is the amplification result of the slope parameter.

Optionally, the second lookup table unit 902 is further configured to:

storing the target conversion parameters, wherein the target conversion parameters comprise a first target conversion parameter and a second target conversion parameter.

Optionally, the calculating unit 903 is further configured to:

determining a sum of the input value and the first target conversion parameter as a first intermediate value;

determining a product of a second target conversion parameter and the first intermediate value as a second intermediate value;

determining a result of the right shift of the second intermediate value as the activation value, wherein the second conversion parameter is 2 of the slope parameter ^N And multiplying, wherein the result of the rightward shift is a result of shifting N bits to the right, and N is a positive integer.

Optionally, the second lookup table unit 902 is further configured to:

the target conversion parameters are stored and obtained by converting fitting parameters, the fitting parameters comprise a slope parameter and an intercept parameter, the secondary lookup table comprises the corresponding relation among the secondary interval, a first conversion parameter, a second conversion parameter and a third conversion parameter, the first conversion parameter is an integer result of an intercept slope ratio, the second conversion parameter is an amplification result of the slope parameter, the third conversion parameter is an integer result of a product of the slope parameter and an error parameter, and the error parameter is a difference value of the intercept slope ratio and the first conversion parameter.

Optionally, the second lookup table unit 902 is further configured to:

storing the target conversion parameters, wherein the target conversion parameters comprise a first target conversion parameter, a second target conversion parameter and a third target conversion parameter.

Optionally, the calculating unit 903 is further configured to:

determining a result of the right shift of the second intermediate value as a third intermediate value, wherein the second conversion parameter is 2 of the slope parameter ^N Multiplying, wherein the result of shifting to the right is the result of shifting N bits to the right, and N is a positive integer;

determining a sum of the third intermediate value and the third target transition parameter as the activation value.

Optionally, the first lookup table unit 901 is further configured to:

and storing the corresponding relation between the first-level interval and the interval parameters, wherein the first-level interval is obtained by uniformly dividing the range of the input value based on the first-level division granularity.

Optionally, in a case that the target primary interval to which the input value belongs is determined from the primary lookup table, the calculating unit 903 is further configured to:

determining a target primary index corresponding to the input value based on the input value, the minimum value of the input value range and the primary partition granularity;

and determining a primary interval corresponding to the target primary index in the primary lookup table as the target primary interval.

Optionally, the first lookup table unit 901 is further configured to:

storing the corresponding relation between the primary intervals and the interval parameters, wherein the secondary intervals in each first interval are uniformly divided, and the target interval parameters comprise the initial secondary indexes of the secondary intervals in the target primary intervals, the target secondary interval number of the target primary intervals and the interval initial values of the target primary intervals.

Optionally, in a case that a target secondary interval to which the input value belongs is determined from a secondary lookup table based on a target interval parameter of the target primary interval, the calculating unit 903 is further configured to:

determining the secondary partition granularity of the secondary interval in the target primary interval based on the primary partition granularity and the target secondary interval number;

determining an index offset based on the input value, the interval starting value of the target primary interval and the secondary partition granularity;

determining a target secondary index based on the index offset and the starting secondary index;

determining a secondary interval corresponding to the target secondary index in the secondary lookup table as the target secondary interval;

and acquiring target conversion parameters corresponding to the target secondary interval from the secondary lookup table.

Optionally, the first lookup table unit 901 is further configured to:

storing the corresponding relation between the first-level interval and the interval parameters, wherein the interval parameters comprise the initial second-level index of the second-level interval in the target first-level interval, the target second-level interval quantity of the target first-level interval and the interval initial value of the target first-level interval, and each second-level interval quantity of the first-level interval and the second derivative change rate of the activation function in the first-level interval are in positive correlation relation.

Optionally, the second searching unit 902 is further configured to:

and storing the target conversion parameters, wherein the target conversion parameters are obtained by converting fitting parameters, and the fitting parameters are determined by adopting a least square method.

In summary, in the embodiment of the present application, based on an input value, a computer device determines a target primary index through a computing unit, reads a target interval parameter in a first lookup table unit, determines a target secondary index based on the target interval parameter by using the computing unit, and further reads a target conversion parameter corresponding to a target secondary interval in a second lookup table unit, where the computing unit multiplexes a quantization computing module based on the target conversion parameter and the input value to perform an activation computation to obtain an activation value; the method reduces the occupation of the storage space caused by activation calculation by searching the non-uniform secondary lookup table, and simultaneously realizes the activation calculation of the multiplexing quantitative calculation module by converting the activation calculation formula and storing the conversion parameters conforming to the calculation logic of the quantitative calculation module, thereby reducing the optimization cost of the LUT table.

The embodiment of the application further provides the NPU. The NPU includes programmable logic circuits and/or program instructions that, when operated, implement the method of activation of a neural network as described in the embodiments above.

The embodiment of the present application further provides a computer-readable storage medium, where at least one program is stored in the computer-readable storage medium, and the at least one program is loaded and executed by an NPU to implement the method for activating a neural network according to any of the above embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.

The above description is intended to be exemplary only, and not to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included therein.

Claims

1. A method of activating a neural network, the method comprising:

and determining an activation value corresponding to the input value based on the target conversion parameter and the input value through a multiplexing quantization algorithm module, wherein the activation value is a fixed point number, and the quantization algorithm module is used for carrying out data quantization and inverse quantization processing.

2. The method of claim 1, wherein the fitting parameters comprise a slope parameter and an intercept parameter, and the secondary lookup table comprises a correspondence between the secondary interval, a first conversion parameter, and a second conversion parameter, wherein the first conversion parameter is an integer of an intercept slope ratio, and the second conversion parameter is an amplified result of the slope parameter.

3. The method of claim 2, wherein the target transition parameters comprise a first target transition parameter and a second target transition parameter;

the determining, by the multiplexing quantization algorithm module, an activation value corresponding to the input value based on the target conversion parameter and the input value, the activation value being a fixed point number, includes:

determining a result of the right shift of the second intermediate value as the activation value, wherein the second conversion parameter is 2 of the slope parameter ^N And multiplying, wherein the result of shifting to the right is the result of shifting N bits to the right, and N is a positive integer.

4. The method of claim 1, wherein the fitting parameters comprise a slope parameter and an intercept parameter, and the secondary lookup table comprises a correspondence between the secondary interval, a first conversion parameter, a second conversion parameter, and a third conversion parameter, wherein the first conversion parameter is a rounding result of an intercept slope ratio, the second conversion parameter is an amplification result of the slope parameter, the third conversion parameter is a rounding result of a product of the slope parameter and an error parameter, and the error parameter is a difference between an intercept slope ratio and the first conversion parameter.

5. The method of claim 4, wherein the target transition parameters comprise a first target transition parameter, a second target transition parameter, and a third target transition parameter;

6. The method of claim 1, wherein the primary interval is obtained by uniformly dividing the range of input values based on a primary division granularity;

the determining the target primary interval to which the input value belongs from the primary lookup table includes:

7. The method of claim 6, wherein the secondary intervals within each of the first intervals are uniformly divided, and the target interval parameters comprise a starting secondary index of the secondary interval in the target primary interval, a target number of secondary intervals of the target primary interval, and an interval starting value of the target primary interval;

the determining, from a secondary lookup table, a target conversion parameter of a target secondary interval to which the input value belongs based on the target interval parameter of the target primary interval includes:

determining the secondary partition granularity of the secondary interval in the target primary interval based on the primary partition granularity and the number of the target secondary intervals;

8. The method of claim 7, wherein the number of secondary intervals in each of the primary intervals is positively correlated to the rate of change of the second derivative of the activation function in the primary interval.

9. The method of claim 1, wherein the fitting parameters are determined using a least squares method.

10. An apparatus for activating a neural network, the apparatus comprising:

11. A neural network processor NPU comprising programmable logic circuitry and/or program instructions which, when run, are adapted to implement a method of activating a neural network as claimed in any one of claims 1 to 9.

12. A neural network processor NPU, comprising activation means of a neural network as claimed in claim 10.

13. A computer device comprising a processor and a memory, wherein at least one program is stored in the memory, and wherein the at least one program is loaded and executed by the processor to implement the method of activating a neural network as claimed in any one of claims 1 to 9.

14. A computer storage medium, wherein at least one program is stored in the computer readable storage medium, and the at least one program is loaded and executed by a processor to implement the method for activating a neural network according to any one of claims 1 to 9.

15. A computer program product, characterized in that the computer program product comprises computer instructions, the computer instructions being stored in a computer readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executing the computer instructions causing the computer device to perform the method of activating a neural network of any one of claims 1 to 9.