CN115062768A - Softmax hardware implementation method and system of logic resource limited platform - Google Patents

Softmax hardware implementation method and system of logic resource limited platform Download PDF

Info

Publication number
CN115062768A
CN115062768A CN202210790639.9A CN202210790639A CN115062768A CN 115062768 A CN115062768 A CN 115062768A CN 202210790639 A CN202210790639 A CN 202210790639A CN 115062768 A CN115062768 A CN 115062768A
Authority
CN
China
Prior art keywords
softmax
function
log
hardware implementation
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210790639.9A
Other languages
Chinese (zh)
Inventor
葛伟
刘殊赫
许艳鸿
韦社年
王一飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210790639.9A priority Critical patent/CN115062768A/en
Publication of CN115062768A publication Critical patent/CN115062768A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a Softmax hardware implementation method and system of a logic resource limited platform, aiming at any n input x 1 ,x 2 ,....,x n The invention realizes complex functions only by using limited basic operation logic units through function equivalent transformation, power base number and logarithm base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, converts the power function and division combination of an original function into the combination of the power function and the logarithm function, simultaneously carries out function fitting with controllable precision according to operation characteristics and data range, saves a large amount of calculation time and iteration process, and effectively reduces the hardware realization area and power consumption cost by using the serial accumulation and the function unit multiplexing.

Description

Softmax hardware implementation method and system of logic resource limited platform
Technical Field
The invention relates to a Softmax hardware implementation method and system of a logic resource limited platform, and belongs to the technical field of hardware implementation of a neural network activation function.
Background
In the field of large data, it is known that,deep Neural Networks (DNNs) have had great success, and an efficient hardware architecture has been a pursued goal in academia and industry. Wherein the Softmax layer is widely applied for different DNNs. The Softmax function is generally used as an activation function of an output layer in a classification task, maps outputs of a plurality of neurons into a (0,1) interval, and has a very wide application in machine learning and deep learning. Especially in dealing with multiple classifications (C)>2) The problem is that the last output unit of the classifier requires numerical processing by the Softmax function. The expression is as follows:
Figure BDA0003730056150000011
exponentiation and division in the Softmax function are expensive to calculate, and cannot be directly calculated particularly in an embedded system, and many implementation schemes based on a lookup table in the prior art introduce excessive resource consumption and have high complexity. There is therefore a need for a rational way of deploying functions on hardware such that they guarantee low resource consumption and can be implemented efficiently.
Disclosure of Invention
The technical problem is as follows: aiming at the problems, the invention provides a Softmax hardware implementation method of a logic resource limited platform, and solves the problems of high cost and low efficiency of the Softmax function hardware implementation by utilizing function fitting and serial accumulation. The technical scheme is as follows:
the technical scheme is as follows: the complete technical scheme of the invention is as follows: a Softmax hardware implementation method of a logic resource limited platform comprises the following steps:
1) expressing the original Softmax function
Figure BDA0003730056150000012
And (3) carrying out transformation:
Figure BDA0003730056150000013
wherein n is the total input number of Softmax; i is the index of the corresponding input and output, i is 1, …, n;
2) calculating each strobe input x 1 ,x 2 ,....,x n Exponential function
Figure BDA0003730056150000014
Where radix e is replaced by radix 2:
Figure BDA0003730056150000015
3) calculating each input x in step 2) 1 ,x 2 ,....,x n The accumulated sum of the exponentiations of (a):
Figure BDA0003730056150000021
4) calculating the natural logarithm of the accumulation result in the step 3):
f_ln=ln(f)
5) calculating each strobe input x 1 ,x 2 ,....,x n Respectively to the power of the exponent added to the result f _ ln of the calculation of step 4), in which the base e is replaced by a base 2:
Figure BDA0003730056150000022
6) storing each result R (i) calculated in the step 5) in a register to obtain the final total output R.
Further, in steps 2) and 5), the same exponential operation module is used for calculation, and the base e is replaced by the base 2, and the exponential operation module comprises two adders, a constant multiplier and a shifter:
Figure BDA0003730056150000023
u is | y.log 2 e | is the integer part of the fixed point number; v is | y.log 2 And e | is the decimal part of the truncated point number. Wherein the absolute value is embodied in fixed-point hardware as a sign bit decision, positiveThe absolute value of the number is the original value, and the absolute value of the negative number is inverted and added by one. Such as step 2) and step 5) of calculation
Figure BDA0003730056150000024
Or
Figure BDA0003730056150000025
The first adder calculates x i -0 or x i -f _ ln; constant multiplier calculation (x) i -f_ln).log 2 e or x i .log 2 e; intercepting the integer part of the absolute value to obtain u; the second adder implements the fitting function: 2 v ≈v+b 1 Wherein b is 1 Is a constant; finally pass through 2 v A shift u to the left or right results in a calculation.
Further, in step 3), a serial accumulation module is used for calculation, and the accumulation enable signal counter is controlled.
Further, in step 4), a logarithm operation module is used for calculating, and the base e is replaced by a base 2, and the logarithm operation module comprises a lead 1 detector (LOD), a decoder and a right shifter, a constant adder and an adder:
ln(f)=ln2*log 2 f=ln2*(w+log 2 k)
t is the median value of the bit where the highest bit of f is 1 and the other bits are 0, w is the index where the highest bit of f is, k is the remainder of the scaled f, and k ∈ [1,2), for example for sixteen-bit fixed point numbers:
if f is 16' b0000_1011_1111.0011,
then t is 16'b0000_1000_0000.0000, w is 4, k is 16' b0000_1.011_1111_ 0011;
if f is 16' b0000_0011_1111.0011,
then t is 16'b0000_ 0010_0000.0000, w is 6, k is 16' b0000_001.1_1111_ 0011.
Further, in the step 4), ln (f) is calculated, and the LOD is used to obtain a middle value t of which the bit where the highest bit of only f is 1 and the other bits are 0; inputting t by a decoder to obtain an index w where the f highest bit is; the right shifter shifts f to the right by w to obtain k, and the constant adder realizes a fitting functionNumber: log (log) 2 k≈k+b 2 Wherein b is 2 Is a constant; final constant multiplier and adder calculation ln2 (w + k + b) 2 )。
Has the advantages that: the method and the system for realizing the Softmax hardware of the logic resource limited platform effectively reduce the complexity of the Softmax hardware realization, realize complex functions only by using limited basic operation logic units through function equivalent transformation, power base number and logarithm base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, and effectively reduce the hardware realization area and the power consumption cost.
Compared with the prior art, the method greatly reduces the circuit complexity, reduces the power consumption and the area cost, does not have the requirement on parameter storage based on an LUT (look-up table) method and the requirement on processing time based on the iteration of a CORDIC (coordinated rotation digital computer) method, increases the throughput rate, and solves the problem of difficult realization of Softmax hardware.
Drawings
FIG. 1 is a diagram of a Softmax hardware implementation of a logical resource constrained platform according to the present invention;
FIG. 2 is a schematic diagram of an implementation of an exponential function circuit;
fig. 3 is a schematic diagram of a logarithmic function circuit implementation.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
As shown in FIG. 1, the input of the Softmax hardware implementation system of the logic resource restricted platform is x 1 ,x 2 ,....,x n Output is r 1 ,r 2 ,....,r n . The method comprises the following steps:
(1) input operand x 1 ,x 2 ,....,x n Then, LnF _ en is set to 0, enabling the counter Cnt [ $ clock 2(n) -1:0]Using Cnt [ $ log2(n) -1:0]Gating x in turn as a gating signal 1 ,x 2 ,....,x n Calculating each strobe input x 1 ,x 2 ,....,x n Exponential function
Figure BDA0003730056150000031
Where radix e is replaced by radix 2:
Figure BDA0003730056150000032
(2) calculating each input x in step (1) 1 ,x 2 ,....,x n The accumulated sum of the exponentiation of Acc _ en is set to 1 when Cnt [ $ log2(n) -1:0]Setting Acc _ en to 0 when the input number n is equal to the input number n, and finishing all x 1 ,x 2 ,....,x n And accumulating the exponentiation power.
(3) And (3) calculating a natural logarithm f _ ln of the accumulation result in the step (2), wherein the base e is replaced by the base 2.
(4) The counter Cnt [ $ clog2(n) -1:0]Set and re-enable, set LnF _ en to 1, compute each strobe input x 1 ,x 2 ,....,x n Exponentiation to power added to the calculation results f _ ln of step (3), respectively, with the base e replaced by a base 2:
Figure BDA0003730056150000041
(5) and (4) storing each result R (i) calculated in the step (4) in a register to obtain the final total output R.
By the design of the invention, the requirements of the Softmax hardware circuit on the performances of high efficiency, low complexity, low area and low energy consumption in the practical application scene are met, and the Softmax calculation of n inputs is theoretically completed once in 2n clock cycles, wherein n is the number of the inputs, and the advantages of the Softmax hardware circuit on the area and the power consumption are ensured.
As shown in fig. 2, the calculation process of the exponential function circuit:
(1) will input x i Or x i -f _ ln and the fixed point constant log 2 e multiplication to obtain fixed point output { u } i ,v i In which u i Is an integer part, v i Is a fractional part, wherein-1 < v i <1,。
(2) Using f (v) i )=v i + b calculation by function fitting 2 v Albeit x i X is not less than 0 i -f _ ln both positive and negativeValue, therefore, it is necessary to fit according to u using a bi-piecewise function i Positive and negative gating of different fitting parameters b 1 Or b 2 The segmentation interval is (-1,0) and [0,1), for example, if the input is 16-bit fixed point number decimal bit width is 4, then b is taken 1 =6'b01_0100,b 2 6' b00_1111, [0,1) interval fit determines a coefficient of 0.9919. Fitting in the (-1,0) interval determined the coefficient to be slightly worse, and fitting by segmentation in the (-1,0) interval with f (v) i )=a*v i + b fitting to improve accuracy, where a-0.4966, b-0.9711, e.g. the input is 16-bit fixed point decimal wide 4, a vi may be obtained by fitting v i Right shift one bit implementation without any additional cost in hardware, take b 2 6' b00_1111, the fit determines the coefficient to be 0.9909.
(3) By u i The sign bit of the highest bit judges whether the bit is positive or negative, if the bit is negative, the bit is inverted and added with one to obtain an absolute value, and the absolute value is gated according to u i Positive or negative of (2) using the value 2 calculated in step (2) v And performing left shift or right shift to obtain a final exponential function calculation result:
Figure BDA0003730056150000042
as shown in fig. 3, the calculation process of the logarithmic function circuit:
(1) the LOD is used to obtain the intermediate value t of the input f where the most significant bit is 1 and the other bits are 0.
(2) And decoding the input t by using a decoder to obtain the index w of the highest bit of f.
(3) Right-shifting the input f by w bits with a shifter yields k, where k e (1, 2).
(4) Log calculation of fitting function by using adder 2 k:log 2 k≈k+b 2 Wherein b is 2 If the total bit width of the output f _ ln of the logarithmic function circuit is 6 decimal bits and 4, the b is-0.9485 fixed point and then the b is 6' b11.0001, and the fitting determination coefficient is 0.9906.
(5) Fitting w with the calculated log of the function in the step (4) 2 k are added, and ln2 is multiplied by the accumulated value by a constant multiplier to obtain a final logarithm function calculation result: ln (f)=ln2*log 2 f=ln2*(w+log 2 k)
The invention adopts function equivalent transformation, power base number and logarithm base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, realizes complex functions only by using a limited basic operation logic unit, combines and transforms the power function and division of an original function into the combination of the power function and the logarithm function, simultaneously carries out function fitting with controllable precision according to operation characteristics and a data range, saves a large amount of calculation time and an iteration process, and effectively reduces the hardware realization area and power consumption cost by utilizing the serial accumulation and the function unit multiplexing.

Claims (7)

1. A Softmax hardware implementation method of a logic resource limited platform is characterized by comprising the following steps:
1) expressing the original Softmax function
Figure FDA0003730056140000011
And (3) carrying out transformation:
Figure FDA0003730056140000012
wherein n is the total input number of Softmax; i is the index of the corresponding input and output, i is 1, …, n;
2) calculating each strobe input x 1 ,x 2 ,....,x n To the power of the exponent of (1), using the counter value as a gating signal;
3) calculating each gating input x in step 2) 1 ,x 2 ,....,x n The cumulative sum of the exponentiations of (c):
Figure FDA0003730056140000013
4) calculating the natural logarithm of the accumulation result in the step 3): f _ ln ═ ln (f);
5) calculating input x 1 ,x 2 ,....,x n Respectively adding the exponential powers to the calculation results of the step 4);
6) and (5) storing each result r (i) calculated in the step 5) in a register to obtain the final total output R (i).
2. The Softmax hardware implementation method of a logical resource restricted platform according to claim 1, wherein:
the step of performing the exponential operation in step 2) and step 5) includes replacing the base e with the base 2, and the exponential function becomes
Figure FDA0003730056140000014
And for x.log 2 e, decomposing:
Figure FDA0003730056140000015
wherein u represents x i .log 2 e integer part, vv fractional part.
3. The Softmax hardware implementation method for a logical resource restricted platform according to claim 1, wherein the logarithmic operation step in step 4) comprises: the base e of the logarithm is replaced by a base of 2, and the logarithm function becomes ln (f) ═ ln2 × log 2 f, and log of ln2 2 f, decomposition: ln (f) ═ ln2 log 2 f=ln2*(w+log 2 k) Where w is the index where the highest bit of the f binary representation is located, k is the remainder variable and k ∈ [1,2 ].
4. A Softmax hardware implementation system of a logical resource restricted platform based on any of the methods in claims 1-3, characterized by comprising the following units:
an exponent operation unit: realizing exponential operation through radix number transformation and linear fitting;
a serial accumulation unit: summing the plurality of input exponentiations to the power of the power;
a logarithmic operation unit: and logarithmic operation is realized through base number transformation and linear fitting.
5. The Softmax hardware implementation system of a logic resource constrained platform of claim 4, wherein the exponent operation unit comprises two adders, a constant multiplier and a shifter, wherein the shifter supports left and right shifting; the first adder calculates x i -0 or x i -f _ ln; constant multiplier calculation (x) i -f_ln).log 2 e or x i .log 2 e; intercepting the integer part of the absolute value to obtain u; the second adder implements the fitting function: 2 v ≈v+b 1 Wherein b is 1 Is a constant; finally pass through 2 v A shift u to the left or right results in a calculation.
6. The system for Softmax hardware implementation of a logic resource restricted platform according to claim 4, wherein said logarithmic calculation unit comprises a lead 1 detector, a decoder and a shifter and a constant adder, wherein the shifter supports only left shifting; it is used for calculating ln (f), and uses the leading 1 detector to obtain the intermediate value t whose only highest bit is 1 and other bits are 0; inputting t by a decoder to obtain an index w where the f highest bit is; and the right shifter shifts f to the right by w to obtain k, and a constant adder is utilized to realize a fitting function: log (log) 2 k≈k+b 2 Wherein b is 2 Is a constant; and finally, calculating the result of the logarithm operation unit by using a constant multiplier and an adder: ln (f) ═ ln2 (w + k + b) 2 )。
7. The Softmax hardware implementation system of a logic resource constrained platform of claim 4, wherein the serial accumulation unit is operable to accept an accumulation of any n inputs, the accumulation enable being controlled by a counter, the accumulation being stopped when the counter value equals n.
CN202210790639.9A 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform Pending CN115062768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210790639.9A CN115062768A (en) 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210790639.9A CN115062768A (en) 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform

Publications (1)

Publication Number Publication Date
CN115062768A true CN115062768A (en) 2022-09-16

Family

ID=83203697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210790639.9A Pending CN115062768A (en) 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform

Country Status (1)

Country Link
CN (1) CN115062768A (en)

Similar Documents

Publication Publication Date Title
CN110070178B (en) Convolutional neural network computing device and method
Gao et al. Design and implementation of an approximate softmax layer for deep neural networks
CN112740171A (en) Multiply and accumulate circuit
CN107305485B (en) Device and method for performing addition of multiple floating point numbers
CN109165006B (en) Design optimization and hardware implementation method and system of Softmax function
US11074041B2 (en) Method and system for elastic precision enhancement using dynamic shifting in neural networks
CN110888623B (en) Data conversion method, multiplier, adder, terminal device and storage medium
CN115407965B (en) High-performance approximate divider based on Taylor expansion and error compensation method
CN111857650B (en) Hardware computing system for realizing arbitrary floating point operation based on mirror image lookup table and computing method thereof
WO2022170811A1 (en) Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network
CN107133012B (en) High-speed self-defined floating point complex divider
CN110187866B (en) Hyperbolic CORDIC-based logarithmic multiplication computing system and method
CN113837365A (en) Model for realizing sigmoid function approximation, FPGA circuit and working method
Takagi Generating a power of an operand by a table look-up and a multiplication
CN107783935B (en) Approximate calculation reconfigurable array based on dynamic precision configurable operation
CN107220025B (en) Apparatus for processing multiply-add operation and method for processing multiply-add operation
Chandra A novel method for scalable VLSI implementation of hyperbolic tangent function
CN110879697B (en) Device for approximately calculating tanh function
CN116933840A (en) Multi-precision Posit encoding and decoding operation device and method supporting variable index bit width
TW202319909A (en) Hardware circuit and method for multiplying sets of inputs, and non-transitory machine-readable storage device
CN115062768A (en) Softmax hardware implementation method and system of logic resource limited platform
CN111984226A (en) Cube root solving device and solving method based on hyperbolic CORDIC
CN107015783B (en) Floating point angle compression implementation method and device
CN114860193A (en) Hardware operation circuit for calculating Power function and data processing method
CN112199072B (en) Data processing method, device and equipment based on neural network layer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination