CN115062768A

CN115062768A - Softmax hardware implementation method and system of logic resource limited platform

Info

Publication number: CN115062768A
Application number: CN202210790639.9A
Authority: CN
Inventors: 葛伟; 刘殊赫; 许艳鸿; 韦社年; 王一飞
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-07-05
Filing date: 2022-07-05
Publication date: 2022-09-16

Abstract

The invention discloses a Softmax hardware implementation method and system of a logic resource limited platform, aiming at any n input x ₁ ,x ₂ ,....,x _n The invention realizes complex functions only by using limited basic operation logic units through function equivalent transformation, power base number and logarithm base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, converts the power function and division combination of an original function into the combination of the power function and the logarithm function, simultaneously carries out function fitting with controllable precision according to operation characteristics and data range, saves a large amount of calculation time and iteration process, and effectively reduces the hardware realization area and power consumption cost by using the serial accumulation and the function unit multiplexing.

Description

Softmax hardware implementation method and system of logic resource limited platform

Technical Field

The invention relates to a Softmax hardware implementation method and system of a logic resource limited platform, and belongs to the technical field of hardware implementation of a neural network activation function.

Background

In the field of large data, it is known that,deep Neural Networks (DNNs) have had great success, and an efficient hardware architecture has been a pursued goal in academia and industry. Wherein the Softmax layer is widely applied for different DNNs. The Softmax function is generally used as an activation function of an output layer in a classification task, maps outputs of a plurality of neurons into a (0,1) interval, and has a very wide application in machine learning and deep learning. Especially in dealing with multiple classifications (C)>2) The problem is that the last output unit of the classifier requires numerical processing by the Softmax function. The expression is as follows:

exponentiation and division in the Softmax function are expensive to calculate, and cannot be directly calculated particularly in an embedded system, and many implementation schemes based on a lookup table in the prior art introduce excessive resource consumption and have high complexity. There is therefore a need for a rational way of deploying functions on hardware such that they guarantee low resource consumption and can be implemented efficiently.

Disclosure of Invention

The technical problem is as follows: aiming at the problems, the invention provides a Softmax hardware implementation method of a logic resource limited platform, and solves the problems of high cost and low efficiency of the Softmax function hardware implementation by utilizing function fitting and serial accumulation. The technical scheme is as follows:

the technical scheme is as follows: the complete technical scheme of the invention is as follows: a Softmax hardware implementation method of a logic resource limited platform comprises the following steps:

1) expressing the original Softmax function

And (3) carrying out transformation:

wherein n is the total input number of Softmax; i is the index of the corresponding input and output, i is 1, …, n;

2) calculating each strobe input x ₁ ,x ₂ ,....,x _n Exponential function

Where radix e is replaced by radix 2:

3) calculating each input x in step 2) ₁ ,x ₂ ,....,x _n The accumulated sum of the exponentiations of (a):

4) calculating the natural logarithm of the accumulation result in the step 3):

f_ln＝ln(f)

5) calculating each strobe input x ₁ ,x ₂ ,....,x _n Respectively to the power of the exponent added to the result f _ ln of the calculation of step 4), in which the base e is replaced by a base 2:

6) storing each result R (i) calculated in the step 5) in a register to obtain the final total output R.

Further, in steps 2) and 5), the same exponential operation module is used for calculation, and the base e is replaced by the base 2, and the exponential operation module comprises two adders, a constant multiplier and a shifter:

u is | y.log ₂ e | is the integer part of the fixed point number; v is | y.log ₂ And e | is the decimal part of the truncated point number. Wherein the absolute value is embodied in fixed-point hardware as a sign bit decision, positiveThe absolute value of the number is the original value, and the absolute value of the negative number is inverted and added by one. Such as step 2) and step 5) of calculation

Or

The first adder calculates x _i -0 or x _i -f _ ln; constant multiplier calculation (x) _i -f_ln).log ₂ e or x _i .log ₂ e; intercepting the integer part of the absolute value to obtain u; the second adder implements the fitting function: 2 ^v ≈v+b ₁ Wherein b is ₁ Is a constant; finally pass through 2 ^v A shift u to the left or right results in a calculation.

Further, in step 3), a serial accumulation module is used for calculation, and the accumulation enable signal counter is controlled.

Further, in step 4), a logarithm operation module is used for calculating, and the base e is replaced by a base 2, and the logarithm operation module comprises a lead 1 detector (LOD), a decoder and a right shifter, a constant adder and an adder:

ln(f)＝ln2*log ₂ f＝ln2*(w+log ₂ k)

t is the median value of the bit where the highest bit of f is 1 and the other bits are 0, w is the index where the highest bit of f is, k is the remainder of the scaled f, and k ∈ [1,2), for example for sixteen-bit fixed point numbers:

if f is 16' b0000_1011_1111.0011,

then t is 16'b0000_1000_0000.0000, w is 4, k is 16' b0000_1.011_1111_ 0011;

if f is 16' b0000_0011_1111.0011,

then t is 16'b0000_ 0010_0000.0000, w is 6, k is 16' b0000_001.1_1111_ 0011.

Further, in the step 4), ln (f) is calculated, and the LOD is used to obtain a middle value t of which the bit where the highest bit of only f is 1 and the other bits are 0; inputting t by a decoder to obtain an index w where the f highest bit is; the right shifter shifts f to the right by w to obtain k, and the constant adder realizes a fitting functionNumber: log (log) ₂ k≈k+b ₂ Wherein b is ₂ Is a constant; final constant multiplier and adder calculation ln2 (w + k + b) ₂ )。

Has the advantages that: the method and the system for realizing the Softmax hardware of the logic resource limited platform effectively reduce the complexity of the Softmax hardware realization, realize complex functions only by using limited basic operation logic units through function equivalent transformation, power base number and logarithm base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, and effectively reduce the hardware realization area and the power consumption cost.

Compared with the prior art, the method greatly reduces the circuit complexity, reduces the power consumption and the area cost, does not have the requirement on parameter storage based on an LUT (look-up table) method and the requirement on processing time based on the iteration of a CORDIC (coordinated rotation digital computer) method, increases the throughput rate, and solves the problem of difficult realization of Softmax hardware.

Drawings

FIG. 1 is a diagram of a Softmax hardware implementation of a logical resource constrained platform according to the present invention;

FIG. 2 is a schematic diagram of an implementation of an exponential function circuit;

fig. 3 is a schematic diagram of a logarithmic function circuit implementation.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in FIG. 1, the input of the Softmax hardware implementation system of the logic resource restricted platform is x ₁ ,x ₂ ,....,x _n Output is r ₁ ,r ₂ ,....,r _n . The method comprises the following steps:

(1) input operand x ₁ ,x ₂ ,....,x _n Then, LnF _ en is set to 0, enabling the counter Cnt [ $ clock 2(n) -1:0]Using Cnt [ $ log2(n) -1:0]Gating x in turn as a gating signal ₁ ,x ₂ ,....,x _n Calculating each strobe input x ₁ ,x ₂ ,....,x _n Exponential function

Where radix e is replaced by radix 2:

(2) calculating each input x in step (1) ₁ ,x ₂ ,....,x _n The accumulated sum of the exponentiation of Acc _ en is set to 1 when Cnt [ $ log2(n) -1:0]Setting Acc _ en to 0 when the input number n is equal to the input number n, and finishing all x ₁ ,x ₂ ,....,x _n And accumulating the exponentiation power.

(3) And (3) calculating a natural logarithm f _ ln of the accumulation result in the step (2), wherein the base e is replaced by the base 2.

(4) The counter Cnt [ $ clog2(n) -1:0]Set and re-enable, set LnF _ en to 1, compute each strobe input x ₁ ,x ₂ ,....,x _n Exponentiation to power added to the calculation results f _ ln of step (3), respectively, with the base e replaced by a base 2:

(5) and (4) storing each result R (i) calculated in the step (4) in a register to obtain the final total output R.

By the design of the invention, the requirements of the Softmax hardware circuit on the performances of high efficiency, low complexity, low area and low energy consumption in the practical application scene are met, and the Softmax calculation of n inputs is theoretically completed once in 2n clock cycles, wherein n is the number of the inputs, and the advantages of the Softmax hardware circuit on the area and the power consumption are ensured.

As shown in fig. 2, the calculation process of the exponential function circuit:

(1) will input x _i Or x _i -f _ ln and the fixed point constant log ₂ e multiplication to obtain fixed point output { u } _i ,v _i In which u _i Is an integer part, v _i Is a fractional part, wherein-1 < v _i ＜1，。

(2) Using f (v) _i )＝v _i + b calculation by function fitting 2 ^v Albeit x _i X is not less than 0 _i -f _ ln both positive and negativeValue, therefore, it is necessary to fit according to u using a bi-piecewise function _i Positive and negative gating of different fitting parameters b ₁ Or b ₂ The segmentation interval is (-1,0) and [0,1), for example, if the input is 16-bit fixed point number decimal bit width is 4, then b is taken ₁ ＝6'b01_0100，b ₂ 6' b00_1111, [0,1) interval fit determines a coefficient of 0.9919. Fitting in the (-1,0) interval determined the coefficient to be slightly worse, and fitting by segmentation in the (-1,0) interval with f (v) _i )＝a*v _i + b fitting to improve accuracy, where a-0.4966, b-0.9711, e.g. the input is 16-bit fixed point decimal wide 4, a vi may be obtained by fitting v _i Right shift one bit implementation without any additional cost in hardware, take b ₂ 6' b00_1111, the fit determines the coefficient to be 0.9909.

(3) By u _i The sign bit of the highest bit judges whether the bit is positive or negative, if the bit is negative, the bit is inverted and added with one to obtain an absolute value, and the absolute value is gated according to u _i Positive or negative of (2) using the value 2 calculated in step (2) ^v And performing left shift or right shift to obtain a final exponential function calculation result:

as shown in fig. 3, the calculation process of the logarithmic function circuit:

(1) the LOD is used to obtain the intermediate value t of the input f where the most significant bit is 1 and the other bits are 0.

(2) And decoding the input t by using a decoder to obtain the index w of the highest bit of f.

(3) Right-shifting the input f by w bits with a shifter yields k, where k e (1, 2).

(4) Log calculation of fitting function by using adder ₂ k：log ₂ k≈k+b ₂ Wherein b is ₂ If the total bit width of the output f _ ln of the logarithmic function circuit is 6 decimal bits and 4, the b is-0.9485 fixed point and then the b is 6' b11.0001, and the fitting determination coefficient is 0.9906.

(5) Fitting w with the calculated log of the function in the step (4) ₂ k are added, and ln2 is multiplied by the accumulated value by a constant multiplier to obtain a final logarithm function calculation result: ln (f)＝ln2*log ₂ f＝ln2*(w+log ₂ k)

The invention adopts function equivalent transformation, power base number and logarithm base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, realizes complex functions only by using a limited basic operation logic unit, combines and transforms the power function and division of an original function into the combination of the power function and the logarithm function, simultaneously carries out function fitting with controllable precision according to operation characteristics and a data range, saves a large amount of calculation time and an iteration process, and effectively reduces the hardware realization area and power consumption cost by utilizing the serial accumulation and the function unit multiplexing.

Claims

1. A Softmax hardware implementation method of a logic resource limited platform is characterized by comprising the following steps:

1) expressing the original Softmax function

And (3) carrying out transformation:

2) calculating each strobe input x ₁ ,x ₂ ,....,x _n To the power of the exponent of (1), using the counter value as a gating signal;

3) calculating each gating input x in step 2) ₁ ,x ₂ ,....,x _n The cumulative sum of the exponentiations of (c):

4) calculating the natural logarithm of the accumulation result in the step 3): f _ ln ═ ln (f);

5) calculating input x ₁ ,x ₂ ,....,x _n Respectively adding the exponential powers to the calculation results of the step 4);

6) and (5) storing each result r (i) calculated in the step 5) in a register to obtain the final total output R (i).

2. The Softmax hardware implementation method of a logical resource restricted platform according to claim 1, wherein:

the step of performing the exponential operation in step 2) and step 5) includes replacing the base e with the base 2, and the exponential function becomes

And for x.log ₂ e, decomposing:

wherein u represents x _i .log ₂ e integer part, vv fractional part.

3. The Softmax hardware implementation method for a logical resource restricted platform according to claim 1, wherein the logarithmic operation step in step 4) comprises: the base e of the logarithm is replaced by a base of 2, and the logarithm function becomes ln (f) ═ ln2 × log ₂ f, and log of ln2 ₂ f, decomposition: ln (f) ═ ln2 log ₂ f＝ln2*(w+log ₂ k) Where w is the index where the highest bit of the f binary representation is located, k is the remainder variable and k ∈ [1,2 ].

4. A Softmax hardware implementation system of a logical resource restricted platform based on any of the methods in claims 1-3, characterized by comprising the following units:

an exponent operation unit: realizing exponential operation through radix number transformation and linear fitting;

a serial accumulation unit: summing the plurality of input exponentiations to the power of the power;

a logarithmic operation unit: and logarithmic operation is realized through base number transformation and linear fitting.

5. The Softmax hardware implementation system of a logic resource constrained platform of claim 4, wherein the exponent operation unit comprises two adders, a constant multiplier and a shifter, wherein the shifter supports left and right shifting; the first adder calculates x _i -0 or x _i -f _ ln; constant multiplier calculation (x) _i -f_ln).log ₂ e or x _i .log ₂ e; intercepting the integer part of the absolute value to obtain u; the second adder implements the fitting function: 2 ^v ≈v+b ₁ Wherein b is ₁ Is a constant; finally pass through 2 ^v A shift u to the left or right results in a calculation.

6. The system for Softmax hardware implementation of a logic resource restricted platform according to claim 4, wherein said logarithmic calculation unit comprises a lead 1 detector, a decoder and a shifter and a constant adder, wherein the shifter supports only left shifting; it is used for calculating ln (f), and uses the leading 1 detector to obtain the intermediate value t whose only highest bit is 1 and other bits are 0; inputting t by a decoder to obtain an index w where the f highest bit is; and the right shifter shifts f to the right by w to obtain k, and a constant adder is utilized to realize a fitting function: log (log) ₂ k≈k+b ₂ Wherein b is ₂ Is a constant; and finally, calculating the result of the logarithm operation unit by using a constant multiplier and an adder: ln (f) ═ ln2 (w + k + b) ₂ )。

7. The Softmax hardware implementation system of a logic resource constrained platform of claim 4, wherein the serial accumulation unit is operable to accept an accumulation of any n inputs, the accumulation enable being controlled by a counter, the accumulation being stopped when the counter value equals n.