CN109816105B - Configurable neural network activation function implementation device - Google Patents

Configurable neural network activation function implementation device Download PDF

Info

Publication number
CN109816105B
CN109816105B CN201910041332.7A CN201910041332A CN109816105B CN 109816105 B CN109816105 B CN 109816105B CN 201910041332 A CN201910041332 A CN 201910041332A CN 109816105 B CN109816105 B CN 109816105B
Authority
CN
China
Prior art keywords
input
gate
flip
output
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910041332.7A
Other languages
Chinese (zh)
Other versions
CN109816105A (en
Inventor
车德亮
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Microelectronic Technology Institute
Mxtronics Corp
Original Assignee
Beijing Microelectronic Technology Institute
Mxtronics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Microelectronic Technology Institute, Mxtronics Corp filed Critical Beijing Microelectronic Technology Institute
Priority to CN201910041332.7A priority Critical patent/CN109816105B/en
Publication of CN109816105A publication Critical patent/CN109816105A/en
Application granted granted Critical
Publication of CN109816105B publication Critical patent/CN109816105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a configurable neural network activation function implementation device which comprises a controller, a sign judgment module, a range detection module, a parameter register, a floating-point multiplier, a floating-point adder, an inverse arithmetic unit, an address generator, a lookup table, a first gating latch and a second gating latch. The operation of the sigmoid function and the tanh function is realized by configuring the control signal M, the device is simple in structure, adopts a synchronous clock design, is convenient for time sequence inspection and verification, is small in area and low in power consumption, is convenient to realize on a chip, and enhances the practicability of embedded application; when the neural network activation function is calculated by using the method, the processing flow is simple and easy to control, and the calculation efficiency of the neural network activation function is improved; the device can conveniently expand the address generator module and the lookup table module to meet the requirement of function precision transformation according to the requirement of tanh calculation precision. Therefore, the invention is an ideal structure for realizing the activation function of the embedded neural network processor.

Description

Configurable neural network activation function implementation device
Technical Field
The invention relates to a configurable neural network activation function implementation device, and belongs to the technical field of computers.
Background
In recent years, the application of artificial intelligence technology is rapidly developed. In particular, various types of neural network structures based on machine learning have been developed in the fields of data mining, classification, image and speech recognition, and other applications. The requirement for improving the intelligent processing capacity of an embedded application system is in a sudden increase situation by hardwarzing the neural network trained by specific intelligent applications. The activation function is the core of the neuron function in the neural network. How to realize the activation function efficiently and concisely has become one of the key technical problems of the neural network embedded application. The activating function belongs to a transcendental function, and the specific hardware implementation mode is a foreign lockout core technology.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the configurable neural network activation function implementation device is simple in implementation structure, small in area, low in power consumption and convenient to implement on a chip.
The technical solution of the invention is as follows:
a configurable neural network activation function implementation device comprises a controller, a sign judgment module, a range detection module, a parameter register, a floating-point multiplier, a floating-point adder, an inverse arithmetic unit, an address generator, a lookup table, a first gating latch and a second gating latch;
a controller: generating latching control signals and operation completion signals done required in the whole operation data path of different activation functions according to the value of the configuration control signal M;
a symbol judgment module: receiving input operation data, judging whether the data is positive or negative, and if the data is regular, outputting the data to a range detection module, an address generator and a floating-point multiplier; otherwise, the absolute value of the data is output to the range detection module, the address generator and the floating-point multiplier, and the sign of the data is output to the address generator and the first gating latch;
a range detection module: judging which interval the received data value is in, and sending a range interval identification signal to the parameter register;
a parameter register: storing parameters of a linear function for approximating the sigmoid activation function, namely a linear coefficient and an offset; selecting a linear function approaching the sigmoid activation function according to the range interval identification signal, outputting a linear coefficient of the linear function to a floating point multiplier, and outputting the offset of the linear function to a floating point adder;
a floating-point multiplier: extracting the first-order coefficient of the linear function output by the range detection module from the parameter register module, calculating the product of the first-order coefficient of the linear function and the data value, and outputting the product to the floating-point adder;
a floating-point adder: extracting the offset of a linear function output by the range detection module from the parameter register module, calculating the sum of the product and the offset from the floating-point multiplier, and outputting the obtained result to the inverse arithmetic unit and the first gating latch;
an inverse operator: calculating the inverse number of the output result of the floating-point adder, and outputting the inverse number to the first gating latch;
a first gating latch: when the sign of the input operational data is positive, the calculation result from the floating-point adder is output to the second gating latch in a gating mode, and when the sign of the input operational data is negative, the calculation result from the inverse arithmetic unit is output to the second gating latch in a gating mode;
an address generator: generating a 17-bit address value as a lookup table index according to the interval where the data value output by the symbol judgment module is located;
a lookup table: storing the tanh activation function value corresponding to each data interval, searching the tanh activation function value corresponding to the input operation data according to the index of the lookup table in the address generator, and outputting the tanh activation function value to the second gating latch;
a second gating latch: and according to the value of the configuration control signal M, gating and outputting and latching the operation result of the sigmoid activation function or the operation result of the tanh activation function.
M is 1, the sigmoid activation function operation is represented, and the controller generates 8 latch control signals mcycle [7:1] needed in the whole operation data path of the sigmoid activation function; m ═ 0 denotes the tanh liveness function operation, and the controller generates 4 latch control signals nmcycle [3:1] required in the entire operation data path of the tanh liveness function.
The controller comprises a D flip-flop dff _0, a D flip-flop dff _1, a D flip-flop dff _2, a D flip-flop dff _3, a D flip-flop dff _4, a D flip-flop dff _5, a D flip-flop dff _6, a D flip-flop dff _7, a D flip-flop dff _8, an AND gate and21_0, an AND gate and21_1, an AND gate and21_2, an AND gate and21_3, an AND gate and21_4, an AND gate and21_5, an AND gate and21_6, an AND gate and21_7, an AND gate and21_8, an AND gate and21_9, an AND gate and21_10, or gates 21_0, or21_1, a two-input NOR gate nor21_0, an inverter inv _0 and an inverter inv _ 1;
the clock input ends clk of the D flip-flop dff _0, the D flip-flop dff _1, the D flip-flop dff _2, the D flip-flop dff _3, the D flip-flop dff _4, the D flip-flop dff _5, the D flip-flop dff _6, the D flip-flop dff _7 and the D flip-flop dff _8 are connected with an external input clock signal clk, the data input end D of the D flip-flop dff _0 is connected with '1' and high potential, and the output end q of the D flip-flop dff _0 outputs a period control signal cycle0 on one hand and is connected with the data input end D of the flip-flop dff _1 on the other hand; the output end q of the D flip-flop dff _1 is connected with the data input end D of the D flip-flop dff _ 2; the output end q of the D flip-flop dff _2 is connected with the data input end D of the D flip-flop dff _ 3; the output end q of the D flip-flop dff _3 is connected with the data input end D of the D flip-flop dff _ 4; the output end q of the D flip-flop dff _4 is connected with the data input end D of the D flip-flop dff _ 5; the output end q of the D flip-flop dff _5 is connected with the data input end D of the D flip-flop dff _ 6; the output end q of the D flip-flop dff _6 is connected with the data input end D of the D flip-flop dff _ 7; the output end q of the D trigger dff _7 is connected with the data input end D of the D trigger dff _ 8;
the input end of the inverter inv _0 is connected with a configuration control signal M; the output terminal of the inverter inv _0 is simultaneously connected to the first input terminal of the AND gate 21_0, the first input terminal of the AND gate 21_2 and the first input terminal of the AND gate 21_4, the output terminal q of the D flip-flop dff _1 is simultaneously connected to the second input terminal of the AND gate 21_0 and the first input terminal of the AND gate 21_1, the configuration control signal M is simultaneously connected to the second input terminal of the AND gate 21_1, the second input terminal of the AND gate 21_3, the second input terminal of the AND gate 21_5, the second input terminal of the AND gate 21_6, the second input terminal of the AND gate 21_7, the second input terminal of the AND gate 21_8, the second input terminal of the AND gate 21_9 and the second input terminal of the AND gate 21_10, the output terminal q of the D flip-flop dff _2 is simultaneously connected to the second input terminal 595 _2 of the AND gate 21 and gate and the first input terminal of the AND gate 21_3, the output terminal of the D flip-flop dff _3 is simultaneously connected to the second input terminal 21 and21, the output terminal q of the D flip-flop dff _4 is simultaneously connected with the first input terminal of the AND gate 21_6, the output terminal q of the D flip-flop dff _5 is simultaneously connected with the first input terminal of the AND gate 21_7, the output terminal q of the D flip-flop dff _6 is simultaneously connected with the first input terminal of the AND gate 21_8, the output terminal q of the D flip-flop dff _7 is simultaneously connected with the first input terminal of the AND gate 21_9, and the output terminal q of the D flip-flop dff _8 is simultaneously connected with the first input terminal of the AND gate 21_ 10;
the output end of the AND gate 21_0 outputs a signal nmcycle [1], the output end of the AND gate 21_1 outputs a signal mccycle [1], the output end of the AND gate 21_2 outputs a signal nmcycle [2], the output end of the AND gate 21_3 outputs a signal mccycle [2], the output end of the AND gate 21_4 outputs a signal nmcycle [3], the output end of the AND gate 21_5 outputs a signal mccycle [3], the output end of the AND gate 21_6 outputs a signal mccycle [4], the output end of the AND gate 21_7 outputs a signal mccycle [5], the output end of the AND gate 21_8 outputs a signal mccycle [6], and the output end of the AND gate 21_9 outputs a signal mccycle [7 ];
the output end of the and gate 21_6 is connected with the first input end of the or gate 21_1, the output end of the and gate 21_10 is connected with the second input end of the or gate 21_1, the output end of the or gate 21_1 outputs the done signal done on one hand, and is connected with the first input end of the two-input nor gate nor21_0 on the other hand, the input end of the inverter inv _1 is connected with an externally input reset signal rst, the output end of the inverter inv _1 is connected with the second input end of the two-input nor gate nor21_0, and the output end of the two-input nor gate 21_0 is simultaneously connected with the reset input ends rst of the D flip-flops dff _0, dff _1, dff _2, dff _3, dff _4, dff _5, dff _6, dff _7 and dff _ 8; the output terminal of the and gate and21_5 is connected to the first input terminal of the or gate or21_0, the output terminal of the and gate and21_9 is connected to the second input terminal of the or gate or21_0, and the output terminal of the or gate or21_0 outputs the signal result _ clk.
The symbol judgment module comprises a data phase inverse complement arithmetic unit DD1, a 16-bit data latch and a D trigger dff _ 9;
the input end DIN of the data phase inverse complement arithmetic device DD1 and the input end d2 of the 16-bit data latch are simultaneously connected with the input operation data, and the output end DOUT of the data phase inverse complement arithmetic device DD1 is connected with the input end d1 of the 16-bit data latch;
the 11 th bit data [11] of the input operational data is simultaneously connected with the control input end ctrl of the 16-bit data latch and the data input end D of the D flip-flop dff _ 9;
the cycle control signal cycle0 is connected to the clock input clk of the 16-bit data latch and the clock input clk of the D flip-flop dff _9 at the same time; an externally input reset signal rst is simultaneously connected with a reset input end rst of the 16-bit data latch and a reset input end rst of the D trigger dff _ 9;
the output D3 of the 16-bit data latch outputs the absolute value data _ abs [15:0] of the input operational data, and the output q of the D flip-flop dff _9 outputs the sign neg of the input operational data.
The inverse operators include a data inverse complement operator DD2 and a D flip-flop dff _21,
the input end DIN of the data phase inversion complement arithmetic device DD2 is connected with the result output by the floating-point adder, and the output end DOUT is connected with D of the data input end of the D flip-flop dff _ 21;
the clock input end clk of the D flip-flop dff _21 is connected with mcycle [5], the reset input end rst of the D flip-flop dff _21 is connected with an external input reset signal rst, and the output end q of the D flip-flop dff _21 outputs the inverse number of the output result of the floating-point adder.
The data inverse complement arithmetic device DD1 and the data inverse complement arithmetic device DD2 have the same implementation flow, which is specifically as follows:
step 1, recording data received by an input end DIN as DIN [15:0], calculating a negative complement D _ MID [13:0] of the DIN [15:0] mantissa, simultaneously performing step 2, step 3 and step 4, and judging the D _ MID [13:0 ];
step 2, judging whether the D _ MID [13:0] is 0, if so, entering the step 5;
step 3, judging whether D _ MID [13:0] overflows or not, if D _ MID [13] xorD _ MID [12] is 1, the D _ MID [13] xorD _ MID ] overflows, and if the D _ MID [12] xorD _ MID ] overflows, entering step 6;
step 4, if D _ MID [13] xorD _ MID [12] is 0, then step 7 is entered;
step 5, setting the D _ MID [17:14] as the maximum negative value to obtain a correct representation method of 0, and simultaneously performing the step 8, the step 9 and the step 10 to judge the D _ MID [17:14 ];
step 6, shifting the D _ MID [13:0] by one bit to the right, adding 1 to the index D _ MID [17:14] DIN [15:12], and simultaneously performing steps 8, 9 and 10 to judge the D _ MID [17:14 ];
step 7, shifting the D _ MID [13:0] by K bits to the left, decreasing the index D _ MID [17:14] to DIN [15:12] by K, simultaneously performing the steps 8, 9 and 10, and judging the D _ MID [17:14 ];
step 8, judging whether the index operation result D _ MID [17:14] overflows in the positive direction, if so, entering the step 11;
step 9, judging whether the exponential operation result D _ MID [17:14] overflows in a negative direction or not, and if yes, entering the step 12;
step 10, judging whether the index operation result D _ MID [17:14] is in the range, if so, entering step 13;
step 11, if D _ MID [13:0] >0, setting D _ MID as the maximum positive value of floating point number; if D _ MID [13:0] <0, setting D _ MID as the maximum negative value of floating point number, and entering step 13;
step 12, setting D _ MID [13:0] as 0 and D _ MID [17:14] as-8, and entering step 13;
and step 13, removing sign extension bits from the 18-bit operation data D _ MID and removing hidden bits to obtain 16-bit floating point format data DOUT output by the DOUT section [15:0] wherein DOUT [15:12] ═ D _ MID [17:14], DOUT [11] ═ D _ MID [12], DOUT [10:0] ═ D _ MID [10:0 ].
The range detection module comprises a floating-point comparator fcom _1, a floating-point comparator fcom _2, a floating-point comparator fcom _3, an inverter inv _4, an inverter inv _5, a two-input or gate or21_2, a three-input or gate or31_0, a three-input or gate or31_1, a D flip-flop dff _11, a D flip-flop dff _12, a D flip-flop dff _13 and a D flip-flop dff _ 14;
the first input end of the floating-point comparator fcom _1 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C0, C0 is (5.0000)10
The first input end of the floating-point comparator fcom _2 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C1, and C1 is (1.0000)10
First input terminal connection sign judgment of floating point comparator fcom _3Data _ abs [15:0]The second input terminal is connected with a constant C2, and C2 is (1.0000)10
The output end of the floating-point comparator fcom _1 is simultaneously connected with the data input end D of the D flip-flop dff _11 and the input end of the inverter inv _3, the output end of the inverter inv _3 is simultaneously connected with the first input end of the two-input OR gate 21_2, the first input end of the three-input OR gate 31_0 and the first input end of the three-input OR gate 31_1, and the output end of the floating-point comparator fcom _2 is simultaneously connected with the input end of the inverter inv _4 and the second input end of the two-input OR gate 21_ 2; the output end of the two-input or gate or21_2 is connected with the data input end D of the D flip-flop dff _ 12; the output terminal of the inverter inv _4 is simultaneously connected to the second input terminal of the three-input or gate 31_0 and the second input terminal of the three-input or gate 31_ 1; the output end of the floating-point comparator fcom _3 is simultaneously connected with the third input end of the three-input OR gate 31_0 and the input end of the inverter inv _5, the output end of the inverter inv _5 is connected with the third input end of the three-input OR gate 31_1, the output end of the three-input OR gate 31_0 is connected with the data input end D of the D flip-flop dff _13, and the output end of the three-input OR gate 31_1 is connected with the data input end D of the D flip-flop dff _ 14;
the reset input ends rst of the D flip-flops dff _11, dff _12, dff _13 and dff _14 are connected with an externally input reset signal rst, mcycle [1] is connected with the clock input ends clk of the D flip-flops dff _11, dff _12, dff _13 and dff _14, the q output end q of the D flip-flop dff _14 outputs a signal range [0], the q output end q of the D flip-flop dff _13 outputs a signal range [1], the q output end q of the D flip-flop dff _12 outputs a signal range [2], and the q output end q of the D flip-flop dff _11 outputs a signal range [3 ].
The address generator comprises four 4-bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen 4;
the cin end of addrgen1 is grounded, the cout end of addrgen1 is connected with the cin end of addrgen2, the cout end of addrgen2 is connected with the cin end of addrgen3, and the cout end of addrgen3 is connected with the cin end of addrgen 4; the clk terminals of addrgen1, addrgen2, addrgen3 and addrgen4 are connected with nmcycle [1]](ii) a The rst end of addrgen1, addrgen2, addrgen3, and addrgen4D [15:0] connected with reset signals rst, addrgen1, addrgen2, addrgen3 and addrgen4 input from outside]The end is connected with the data output by the symbol judging module; d3[15:0] of addrgen1]Terminal constant (8)10Corresponding floating point number, d2[15:0]Terminal constant (7.5)10Corresponding floating point number, d1[15:0]Terminal constant (7)10Corresponding floating point number, d0[15:0]Terminal constant (6.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [16:13 ]];
D3[15:0] of Addrgen2]Terminal constant (6)10Corresponding floating point number, d2[15:0]Terminal constant (5.5)10Corresponding floating point number, d1[15:0]Terminal constant (5)10Corresponding floating point number, d0[15:0]Terminal constant (4.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [12:9 ]];
D3[15:0] of Addrgen3]Terminal constant (4)10Corresponding floating point number, d2[15:0]Terminal constant (3.5)10Corresponding floating point number, d1[15:0]Terminal constant (3)10Corresponding floating point number, d0[15:0]Terminal constant (2.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [8: 5]];
D3[15:0] of Addrgen4]Terminal constant (2)10Corresponding floating point number, d2[15:0]Terminal constant (1.5)10Corresponding floating point number, d1[15:0]Terminal constant (1)10Corresponding floating point number, d0[15:0]Terminal constant (0.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [4: 1]]The cout terminal of addrgen4 outputs addr [0]]。
The 4-bit address generating modules addrgen1, addrgen2, addrgen3 and addrgen4 are identical in structure and respectively comprise a floating-point comparator fcom _4, a floating-point comparator fcom _5, a floating-point comparator fcom _6, a floating-point comparator fcom _7, an inverter inv _6, an inverter inv _7, an inverter inv _8, an inverter inv _9, a two-input or gate or21_3, a three-input or gate or31_2, a four-input or gate or41_0, a five-input or gate or51_0, a five-input or gate or51_1, a D trigger dff _16, a D trigger dff _17, a D trigger dff _18 and a D trigger dff _ 19;
the floating-point comparator fcom _4, the floating-point comparator fcom _5, the floating-point comparator fcom _6 and the floating-point comparator fcom _7 are provided with two input ends;
the first input end of the floating-point comparator fcom _4 is the d [15:0] end of the 4-bit address generation module, and the second input end is the d3[15:0] end of the 4-bit address generation module;
the first input end of the floating-point comparator fcom _5 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d2[15:0] end of the 4-bit address generating module;
the first input end of the floating-point comparator fcom _6 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d1[15:0] end of the 4-bit address generating module;
the first input end of the floating-point comparator fcom _7 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d0[15:0] end of the 4-bit address generating module;
a first input end of the two-input or gate 21_3, a first input end of the three-input or gate 31_2, a first input end of the four-input or gate 41_0, a first input end of the five-input or gate 51_0 and a first input end of the five-input or gate 51_1 are connected and then serve as a cin end of the 4-bit address generation module;
the output end of the floating-point comparator fcom _4 is simultaneously connected with the input end of the inverter inv _6 and the second input end of the two-input or gate 21_3, and the output end of the two-input or gate 21_3 is connected with the input end D of the D flip-flop dff _ 16;
the output end of the floating-point comparator fcom _5 is simultaneously connected with the input end of the inverter inv _7 and the third input end of the three-input or gate or31_2, the output end of the inverter inv _6 is connected with the second input end of the three-input or gate or31_2, and the output end of the three-input or gate or31_2 is connected with the input end D of the D flip-flop dff _ 17;
the output end of the floating-point comparator fcom _6 is simultaneously connected with the input end of the inverter inv _8 and the fourth input end of the four-input or gate or41_0, the output end of the inverter inv _6 is connected with the second input end of the four-input or gate or41_0, the output end of the inverter inv _7 is connected with the third input end of the four-input or gate or41_0, and the output end of the four-input or gate or41_0 is connected with the input end D of the D flip-flop dff _ 18;
the output end of the floating-point comparator fcom _7 is simultaneously connected with the input end of the inverter inv _9 and the fifth input end of the five-input or gate or51_0, the output end of the inverter inv _6 is connected with the second input end of the five-input or gate or51_0, the output end of the inverter inv _7 is connected with the third input end of the five-input or gate or51_0, the output end of the inverter inv _8 is connected with the fourth input end of the five-input or gate or51_0, and the output end of the five-input or gate or51_0 is connected with the input end D of the D flip-flop dff _ 19;
the output end of the inverter inv _6 is connected with the second input end of the five-input OR gate 51_1, the output end of the inverter inv _7 is connected with the third input end of the five-input OR gate 51_1, the output end of the inverter inv _8 is connected with the fourth input end of the five-input OR gate 51_1, the output end of the inverter inv _9 is connected with the fifth input end of the five-input OR gate 51_1, and the output end of the five-input OR gate 51_1 serves as the cout end of the 4-bit address generation module;
the reset input end rst of the D flip-flop dff _16, the D flip-flop dff _17, the D flip-flop dff _18 and the D flip-flop dff _19 is connected and then serves as the rst end of the 4-bit address generation module, the clock input end clk is connected and then serves as the clk end of the 4-bit address generation module, the D flip-flop dff _16 outputs the 3 rd bit address, the D flip-flop dff _17 outputs the 2 nd bit address, the D flip-flop dff _18 outputs the 1 st bit address, and the D flip-flop dff _19 outputs the 0 th bit address.
The working flows of the floating-point comparator fcom _1, the floating-point comparator fcom _2, the floating-point comparator fcom _3, the floating-point comparator fcom _4, the floating-point comparator fcom _5, the floating-point comparator fcom _6 and the floating-point comparator fcom _7 are the same, and the working flow of each floating-point comparator is as follows:
step 1, comparing the input data of the first input end with the constant of the second input end:
adding a mantissa hidden bit according to a floating point data format, and simultaneously expanding a sign bit;
subtracting the mantissa of the constant at the second input terminal from the mantissa of the input data at the first input terminal, and recording the result as dm [13:0 ];
step 2, comparing the input data of the first input end with the index of the constant of the second input end:
extending a sign bit, and subtracting the exponent of the constant of the second input end from the exponent of the input data of the first input end to obtain a result which is marked as de [4:0 ];
and 3, carrying out symbol judgment on the mantissa comparison result dm [13:0 ]:
sdm is calculated using the following formula: dm-dm 13 xor dm 12;
when sdm is 0, dm [13:0] is a positive number; when sdm is 1, dm [13:0] is negative;
and 4, performing symbol judgment on the index comparison result de:
sde is calculated using the following formula: sde ═ de [4] xor dm [3 ];
sde is 0, de [4:0] is positive; sde is 1, de 4:0 is negative;
step 5, judging the comparison result, if sde is 0 and de [4:0] is not 0, the input data of the first input end is larger than the constant of the second input end, and the output is 0; if sde is 0, de [4:0] is 0, and sdm is 0, the input data at the first input is greater than the constant at the second input, and the output is 0; if sde is 0, de [4:0] is 0, and sdm is 1, the input data at the first input terminal is less than the constant at the second input terminal, and the output is 1; if sde is 1, the first input data is less than the second input constant and the output is 1.
The parameter register comprises four 32-bit tri-state control gates and a 32-bit D trigger dff _15, wherein the control signal end of the jth 32-bit tri-state control gate is connected with the jth bit of the range interval identification signal output by the range detection module, and j is 0,1,2 and 3; the reset signal ends of four 32-bit three-state control gates are all connected with an externally input reset signal rst, the output ends of the four 32-bit three-state control gates are all connected with a data input end D of a 32-bit D trigger dff _15, the reset input end rst of the 32-bit D trigger dff _15 is connected with the externally input reset signal rst, a clock input end clk of the 32-bit D trigger dff _15 is connected with an external mcle [2], the high 16 bits output by an output end q of the 32-bit D trigger dff _15 are first-order coefficients A [15:0], and the low 16 bits are offsets B [15:0 ].
The lookup table comprises 33 16-bit tri-state control gates, 33 two-input OR gates, 16 inverters and 1 16-bit D flip-flop dff _ 20;
first 17 two-input OR gateTwo input ends are connected with the symbol output by the symbol judging module, and in the first 17 two input OR gates, the kth1A first input terminal of the two-input OR gate is connected with a k-th bit address value, k, generated by the address generator1=0,1,2,……16;
The last 16 two-input OR gates correspond to the 16 phase inverters one by one, the second input ends of the last 16 two-input OR gates are all connected with the output ends of the corresponding phase inverters, the input ends of the 16 phase inverters are all connected with the symbols output by the symbol judging module, and in the last 16 two-input OR gates, the kth input end is connected with the symbol output by the symbol judging module2A first input terminal of the two-input OR gate is connected with the kth generated by the address generator2-16 bit address value, k2=17,18,19,……32;
The output end of each two-input OR gate is connected with the second input end of the corresponding 16-bit tri-state control gate, the first input ends of 33 16-bit tri-state control gates and the reset input end rst of the 16-bit D flip-flop dff _20 are connected with an external input reset signal rst, the output ends of 33 16-bit tri-state control gates are connected together and then connected with the data input end D of the 16-bit D flip-flop dff _20, the clock input end clk of the 16-bit D flip-flop dff _20 is connected with nmcycle [2], and the output end q of the 16-bit D flip-flop dff _20 outputs the tanh activation function value lut _ data [15:0] corresponding to the input operational data.
The first gating latch comprises 16 gating latch units with the same structure;
the p-th gating latch unit comprises a selector mux _1, an inverter inv _2 and a D flip-flop dff _22, wherein the input end A of the selector mux _1 is connected with the p-th bit of the calculation result of the floating-point adder, the input end B of the selector mux _1 is connected with the p-th bit of the calculation result of the inverse operator, the output end of the selector mux _1 is connected with the input end D of the D flip-flop dff _22, the input end of the inverter inv _2 and the AS end of the selector mux _1 are simultaneously connected with the data symbol output by the symbol judgment module, and the output end of the inverter inv _2 is connected with the BS end of the selector mux _ 1; the clock input clk of the D flip-flop dff _22 is connected to mcycle [6], the reset input rst of the D flip-flop dff _22 is connected to an externally input reset signal, and the output of the D flip-flop dff _22 outputs the p-th bit output data of the first gated latch, where p is 0,1,2, 3, … … 15.
The second gating latch comprises 16 gating latch units with the same structure; the qth gating latch unit comprises a selector mux _2, an inverter inv _2 ' and a D flip-flop dff _23, wherein the input end A of the selector mux _2 is connected with the pth bit of the output result of the lookup table, the input end B of the selector mux _2 is connected with the pth bit of the output result of the first gating latch, the output end of the selector mux _2 is connected with the input end D of the D flip-flop dff _23, the input end of the inverter inv _2 ' and the AS end of the selector mux _2 are simultaneously connected with a configuration control signal M, and the output end of the inverter inv _2 ' is connected with the BS end of the selector mux _ 2; the clock input end clk of the D flip-flop dff _23 is connected with a period control signal of the controller, the reset input end rst of the D flip-flop dff _23 is connected with an externally input reset signal, and the output end of the D flip-flop dff _23 outputs the p-th bit output data of the second gating latch.
The invention has the following advantages:
(1) the invention adopts the table look-up to calculate the tanh function value and the linear function fitting to calculate the sigmoid function value, and the two function calculation modes have simple and convenient structures; meanwhile, partial functional modules in the operation data path of the sigmoid function and the tanh function can be shared according to the time sequence control, and compared with the respective independent implementation, the implementation area is reduced; the functional modules in the operation data path of the sigmoid function and the tanh function have dynamic power consumption only when the corresponding periodic control signals are effective, and have no dynamic power consumption at other times, so that the power consumption of the whole circuit is low; in addition, the invention adopts the design of synchronous clock, is convenient for the time sequence check and verification, and enhances the practicability of embedded application.
(2) When the neural network activation function is calculated by using the method, the generation of the function value can be controlled only by giving a one-bit mode control rule, a clock and a reset signal, and the control flow is simple; if the device of the invention is used for calculating only one type of function value, the device forms a calculation production line, and the efficiency of calculating the neural network activation function can be improved.
(3) The invention can conveniently expand the address generator module and the lookup table module to meet the requirement of function precision transformation according to the requirement of tanh calculation precision.
Drawings
FIG. 1 is a 16-bit floating point data format;
FIG. 2 is a component structure of the present invention;
FIG. 3 is a controller architecture;
FIG. 4 is a block diagram of a symbol decision module;
FIG. 5 is a flow chart of a data phase complement operation;
FIG. 6 is a structure of a one-bit strobe latch, in which (a) is a schematic diagram of a one-bit structure of a first strobe latch, and (b) is a schematic diagram of a one-bit structure of a second strobe latch;
FIG. 7 is a structure of a range detection module;
FIG. 8 is a floating point comparator flow diagram;
FIG. 9 is a parameter register structure of 4 words of 32 bits in length;
FIG. 10 is a flow chart of a floating-point multiplier operation;
FIG. 11 is a flow diagram of a floating-point adder operation;
FIG. 12 is an address generator structure;
FIG. 13 is a 4-bit address generation block structure;
FIG. 14 is a look-up table structure;
fig. 15 shows a structure of a reverse operator.
Detailed Description
For a more clear understanding of the present invention, reference is now made to the following detailed description taken in conjunction with the accompanying drawings.
According to the characteristics of the neural network data operation, the data processed by the method is defined by the following data format: the data length is 16-bit binary number, and the data is floating point number; 4 bits of 16 bits binary number [15:12] are exponent region of floating point number, 11 th bit is sign bit; an implicit bit is implicit between [11:10 ]; [10:0] is the fractional part of a floating point number; portions of the 16-bit floating point number are represented by two's complement. As shown in fig. 1. The implicit bit between bit 11 and bit 10 is to be added to the operator bit in an operation to complete the operation as a binary point bit, the most significant non-sign bit being located immediately to the left of the binary point when it is to be represented, in this floating point format the floating point complement x is given by:
X=01.f×2e if s=0
X=10.f×2e if s=0
X=0 if e=-8
the reserved values given below must be used to represent 0 in this short floating point format:
e=-8
s=0
f=0
range and precision of the floating point format:
maximum positive number x ═ 2-2-11)×27=2.5594×102
Minimum positive number x 1 × 2-7=7.8125×10-3
Minimum negative number x ═ 1-2-11)×2-7=-7.8163×10-3
Maximum negative number x-2 x 2-7=-2.5600×102
On the basis of the above, the invention provides a configurable implementation device of a neural network activation function, as shown in fig. 2, which mainly comprises a controller and an operation data path of a sigmoid function and a tanh function. The implementation device of the whole configurable neural network activation function has 4 input signals, namely an external input clock signal clk, an external input reset signal rst, a configuration control signal M, and a DATA value DATA [15:0] of an activation function argument (namely input operation DATA which is 16-bit floating point DATA), wherein the floating point format is as shown in the specification; 2 output signals: operation result of activation function [15:0] (this is 16-bit floating point data), and a DONE status signal DONE to activate the function operation.
When sigmoid activation function operation is carried out, a configuration control signal M is configured to be 1; the configuration control signal M, an external input clock signal clk and an external input reset signal rst are input into the controller to generate control signals cycle0 and mcycle [6:1] of 8 clock cycles for completing the sigmoid activation function operation data path]And result _ clk; the operation data path of the sigmoid activation function is judged by a symbolThe device comprises a disconnection module, a range detection module, a parameter register, a floating-point multiplier, a floating-point adder, an inverse arithmetic unit, a first gating latch and a second gating latch. The sigmoid activation function operation process is as follows: DATA values DATA [15:0] of sigmoid activation function arguments]Firstly, entering a symbol judging module, judging the positive and negative conditions of the value of input operational data in the module, if the value of the input data is regular, outputting the data, otherwise, outputting the opposite number of the data, namely the absolute value of the data, and obtaining the result of data _ abs [15:0]]And the sign neg of the output DATA, neg being 1 means DATA is negative, neg being 0 means DATA is positive, and DATA _ abs [15: 0%]The neg signals are all latched by a clock period control signal cycle0 of the sigmoid activating function; output data _ abs [15:0] of symbol decision module]Input to the Range instrumentation Module, data _ abs [15:0]The value range of (2) is divided into 4 value intervals, and the range detection is to judge the data _ abs [15:0]]In which value interval, according to the interval of data value, deciding to adopt approximate linear function to estimate the value of sigmoid activating function, and forming the register address rang [3:0] of linear function coefficient and offset while determining the value interval]A period control signal mcycle [1]]Register addresses rang [3:0] responsible for linear function coefficients and offsets]Latching the output; ran g [3:0]Outputting to a parameter register for selecting parameters approximating a first-order function of the sigmoid activation function, namely a first-order coefficient and an offset, and 32 bits of bit width of the parameter register, wherein the upper 16 bits are the first-order coefficient A [15:0]]The lower 16 bits are offset B [15:0]]A period control signal mcycle [2]]Responsible for the output of the parameter register, i.e. A [15:0]]And B [15:0]]The output of (1) is latched; a [15:0]]And data _ abs [15:0]]Inputting the signals into a floating-point multiplier to perform floating-point multiplication calculation, and controlling the period by a period control signal mcycle [3]]Responsible for the output AX [15:0] of the floating-point multiplier]Latching of (1); output AX [15:0] of floating-point multiplier]And the output of the parameter register B [15:0]]Inputting the result to a floating-point adder for floating-point addition calculation, and controlling the cycle of a cycle [4]]Responsible for the output of the floating-point adder Y [15:0]]Latching of (1); the outputs of the floating-point adder Y [15:0]]Input to an inverse operator for calculating Y [15:0]]The value of the inverse of (N) is NY [15:0]A period control signal mcycle [5]]Responsible for the output NY [15:0] of the inverse operator]Latching of (1); output NY [15:0] of inverse operator]And the output of the floating-point adder Y [15:0]]Inputting the input to the first gating latch, gating and outputting according to the data symbol neg output by the data symbol judging module, and gating and outputting the operation result Y [15:0] of the floating-point adder when the neg is equal to 0]When neg is 1, the result NY of the inverse operator is gated and output [15:0]]A period control signal mcycle [6]]Output sigmoid _ result [15:0] responsible for the first gated latch]Latching of (1); output sigmoid _ result of first pass latch [15: 0)]Inputting the signal into a second gating latch, gating and outputting and latching the operation result of the sigmoid activation function or the operation result of the tanh activation function according to the value of the configuration control signal M, and gating and outputting sigmoid _ result [15:0] when M is equal to 1]The period control signal result _ clk is responsible for the output result [15:0] of the second gated latch]The output of sigmoid activation function is output; and the controller outputs DONE (high effective) to represent the completion of the operation of the sigmoid activation function. When the tanh activation function operation is performed, the configuration control signal M is configured to be 0, the configuration control signal M, the external input clock signal clk and the reset signal rst are input into the controller, and control signals cycle0 and nmcycle [2: 1] of 4 clock cycles for completing the tanh activation function operation data path are generated]And result _ clk; the tanh activation function operation data path is composed of a symbol judgment module, an address generation module, a lookup table module and a gating latch 2. the tan h activation function operation process is as follows: firstly, the DATA value DATA of the tan h activation function independent variable enters a symbol judgment module, the positive and negative conditions of the value of input operation DATA are judged in the module, if the value of the input DATA is regular, the DATA is output, otherwise, the value of the opposite number of the DATA, namely the absolute value of the DATA is output, and the result is DATA _ abs [15:0]]And the sign neg of the output DATA, neg being 1 means DATA is negative, neg being 0 means DATA is positive, and DATA _ abs [15: 0%]And the neg signals are all latched by a clock cycle control signal cycle0 of the tanh activation function; output data _ abs [15:0] of symbol decision module]Input to an address generation module, data _ abs [15: 0%]Is divided into 2nNumber value interval (n is determined according to the accuracy requirement of tanh function, n is 4 in the specification), address generation needs to quickly judge data _ abs [15:0]]In which value intervalIn the method, according to the interval of the data value, the address addr [17:0] of the lookup table is formed]The period control signal nmcycle [1]]Responsible for looking up the table address addr [17:0]Latching the output; addr [17:0]]Output to a lookup table for selecting a function value of the tanh activation function stored in the lookup table, a period control signal nmcycle [2]]The output latch of the function value of the parameter tanh activation function is responsible; function value of tanh activation function, tanh _ result [15: 0%]The signal is input into a second gating latch, the second gating latch gates output and latches according to the value of the configuration control signal M, when M is equal to 0, the second gating latch gates output function values of tanh activation function, the period control signal result _ clk is responsible for the output of the second gating latch, result [15: 0%]I.e. the output of the result of the operation of the tanh activation function. The controller outputs DONE (high active) and the operation of the characterization tanh activation function is completed. Referring to fig. 2, the sign determination module and the second strobe latch in the operation data path of the sigmoid function and the tanh function are modules shared by the two activation function data paths.
The input signals of the controller (refer to fig. 2 and fig. 3) are the configuration control signal M, the external input clock signal clk and the external input reset signal rst, and the output signals are the cycle control signals cycle0, mcycle [6:1], nmcycle [3:1], result _ clk and the operation completion signal DONE. The functions are as follows: generating periodic control signals of operation data paths of different activation functions and a marking signal DONE for completing the operation of the activation functions according to the value of the configuration control signal m; when M is equal to 0, the controller generates 4 latch control signals of the tanh activation function operation data path; when M is equal to 1, the controller generates 8 latch control signals of the sigmoid activation function operation data path.
The controller mainly comprises 9-bit D flip-flop structures, each bit of the D flip-flop structures are the same, the structure of the controller is shown in FIG. 3, and the controller comprises a D flip-flop dff _0, a D flip-flop dff _1, a D flip-flop dff _2, a D flip-flop dff _3, a D flip-flop dff _4, a D flip-flop dff _5, a D flip-flop dff _6, a D flip-flop dff _7, a D flip-flop dff _8, an AND gate and21_0, an AND gate 21_1, an AND gate and21_2, an AND gate 21_3, an AND gate and21_4, an AND gate 21_5, an AND gate 21_6, an AND gate 21_7, an AND gate 21_8, an AND gate 21_9, an AND gate 21_10, or a gate 21_0, an or21_1, a two-input NOR gate 21_0, an inverter inv _0 and inv _ 1. The clock input ends clk of the D flip-flop dff _0, the D flip-flop dff _1, the D flip-flop dff _2, the D flip-flop dff _3, the D flip-flop dff _4, the D flip-flop dff _5, the D flip-flop dff _6, the D flip-flop dff _7 and the D flip-flop dff _8 are connected with an external input clock signal clk, the data input end D of the D flip-flop dff _0 is connected with '1' and high potential, and the output end q of the D flip-flop dff _0 outputs a period control signal cycle0 on one hand and is connected with the data input end D of the flip-flop dff _1 on the other hand; the output end q of the D flip-flop dff _1 is connected with the data input end D of the D flip-flop dff _ 2; the output end q of the D flip-flop dff _2 is connected with the data input end D of the D flip-flop dff _ 3; the output end q of the D flip-flop dff _3 is connected with the data input end D of the D flip-flop dff _ 4; the output end q of the D flip-flop dff _4 is connected with the data input end D of the D flip-flop dff _ 5; the output end q of the D flip-flop dff _5 is connected with the data input end D of the D flip-flop dff _ 6; the output end q of the D flip-flop dff _6 is connected with the data input end D of the D flip-flop dff _ 7; the output end q of the D trigger dff _7 is connected with the data input end D of the D trigger dff _ 8;
the input end of the inverter inv _0 is connected with a configuration control signal M; the output terminal of the inverter inv _0 is simultaneously connected to the first input terminal of the AND gate 21_0, the first input terminal of the AND gate 21_2 and the first input terminal of the AND gate 21_4, the output terminal q of the D flip-flop dff _1 is simultaneously connected to the second input terminal of the AND gate 21_0 and the first input terminal of the AND gate 21_1, the configuration control signal M is simultaneously connected to the second input terminal of the AND gate 21_1, the second input terminal of the AND gate 21_3, the second input terminal of the AND gate 21_5, the second input terminal of the AND gate 21_6, the second input terminal of the AND gate 21_7, the second input terminal of the AND gate 21_8, the second input terminal of the AND gate 21_9 and the second input terminal of the AND gate 21_10, the output terminal q of the D flip-flop dff _2 is simultaneously connected to the second input terminal 595 _2 of the AND gate 21 and gate and the first input terminal of the AND gate 21_3, the output terminal of the D flip-flop dff _3 is simultaneously connected to the second input terminal 21 and21, the output terminal q of the D flip-flop dff _4 is simultaneously connected with the first input terminal of the AND gate 21_6, the output terminal q of the D flip-flop dff _5 is simultaneously connected with the first input terminal of the AND gate 21_7, the output terminal q of the D flip-flop dff _6 is simultaneously connected with the first input terminal of the AND gate 21_8, the output terminal q of the D flip-flop dff _7 is simultaneously connected with the first input terminal of the AND gate 21_9, and the output terminal q of the D flip-flop dff _8 is simultaneously connected with the first input terminal of the AND gate 21_ 10;
the output end of the AND gate 21_0 outputs a signal nmcycle [1], the output end of the AND gate 21_1 outputs a signal mccycle [1], the output end of the AND gate 21_2 outputs a signal nmcycle [2], the output end of the AND gate 21_3 outputs a signal mccycle [2], the output end of the AND gate 21_4 outputs a signal nmcycle [3], the output end of the AND gate 21_5 outputs a signal mccycle [3], the output end of the AND gate 21_6 outputs a signal mccycle [4], the output end of the AND gate 21_7 outputs a signal mccycle [5], the output end of the AND gate 21_8 outputs a signal mccycle [6], and the output end of the AND gate 21_9 outputs a signal mccycle [7 ];
the output end of the and gate 21_6 is connected with the first input end of the or gate 21_1, the output end of the and gate 21_10 is connected with the second input end of the or gate 21_1, the output end of the or gate 21_1 outputs the done signal done on one hand, and is connected with the first input end of the two-input nor gate nor21_0 on the other hand, the input end of the inverter inv _1 is connected with an externally input reset signal rst, the output end of the inverter inv _1 is connected with the second input end of the two-input nor gate nor21_0, and the output end of the two-input nor gate 21_0 is simultaneously connected with the reset input ends rst of the D flip-flops dff _0, dff _1, dff _2, dff _3, dff _4, dff _5, dff _6, dff _7 and dff _ 8; the output terminal of the and gate and21_5 is connected to the first input terminal of the or gate or21_0, the output terminal of the and gate and21_9 is connected to the second input terminal of the or gate or21_0, and the output terminal of the or gate or21_0 outputs the signal result _ clk.
The input signals of the symbol judgment module are input DATA DATA [15:0], a period control signal cycle0 and a reset signal rst; the output signals are the positive and negative identification symbols neg of the data and the absolute values data _ abs [15:0 ]. The functions are as follows: when the 11 th bit DATA [11] of the sign bit of the input DATA is equal to 0, the input DATA is positive, and the cycle0 latches the DATA and outputs the DATA when the DATA is valid; when DATA [11] is 1, the input DATA is negative, and when cycle0 is effective, the complement value of the opposite number of the DATA is latched and output; the sign bit DATA [11] of the input DATA, latched and output as neg when cycle0 is active; when the reset signal rst is valid, the output signal data _ abs [15:0] and neg are set to 0.
The structure of the sign determination module is shown in fig. 4, and the input end DIN of the input 16-bit data-to-data phase-opposite complement operator DD1 and the input end d2 of the 16-bit data latch; the output end DOUT of the data phase inverse complement operator DD1 is connected with the input end D1 of the 16-bit data latch; the 11 th bit DATA [11] of the input 16-bit DATA is connected to the control input ctrl of the latch and the DATA input D of the D flip-flop dff _ 9; the input cycle control signal cycle0 is connected to the clock input clk of the latch and the clock input clk of the D flip-flop dff _ 9; the external input reset signal rst turns on the reset input rst of the 16-bit data latch and the reset input rst of the D flip-flop dff _ 9.
The operational flow diagram of the data phase complement operator (see FIG. 5) is to implement the complement of the inverse of a 16-bit floating-point data DIN [15:0], and requires 13 steps.
In step 1, the inverse complement of the 16-bit floating-point DIN [15:0] mantissa is calculated. According to the rule of floating-point operations, a 16-bit floating-point number DIN [15:0] is extended to 18-bit operation data D [17:0], a 4-bit exponent DIN [15:12] of DIN is unchanged in bit width, corresponding to D [17:14], a 12-bit mantissa DIN [11:0] of DIN is extended to 14-bit D [13:0], where D [10:0] ═ DIN [10:0], D [11] ═ DIN [11] is an implicit bit added in the floating-point format, D [12] ═ DIN [11] is a sign bit of a mantissa, and D [13] ═ DIN [11] is an extension of the sign bit of the mantissa by one bit. After the data expansion is completed, D [13:0] is inverted by bit and added with 1 at the lowest bit, and the operation result is D _ MID [13:0 ]. Simultaneously performing the step 2, the step 3 and the step 4, and judging the D _ MID [13:0 ];
step 2, judging whether the mantissa operation result D _ MID [13:0] of the operation data is 0, if so, entering step 5.
And 3, judging whether the mantissa operation result D _ MID [13:0] of the operation data overflows or not, if the D _ MID [13] xorD _ MID [12] is 1, the operation data overflows, and if the D _ MID [13] xorD _ MID [12] overflows, entering the 6 th step.
Step 4, when the D _ MID [13:0] has no overflow, namely the D _ MID [13] xorD _ MID [12] is equal to 0, the step 7 is carried out;
and 5, when the mantissa operation result D _ MID [13:0] is 0, setting D _ MID [17:14] to be the maximum negative value to obtain a correct representation method of 0, and simultaneously performing the 8 th step, the 9 th step and the 10 th step to judge the D _ MID [17:14 ].
And 6, when the mantissa operation result D _ MID [13:0] overflows, right shifting the mantissa D _ MID [13:0] by one bit and adding 1 to the exponent DIN [15:12], and simultaneously performing 8, 9 and 10 steps to judge the D _ MID [17:14 ].
And step 7, a processing mode when the mantissa operation result D _ MID [13:0] does not overflow, wherein the mantissa D _ MID [13:0] is shifted to the left by K bits and the exponent DIN [15:12] is subtracted by K, and the step 8, the step 9 and the step 10 are simultaneously carried out to judge the D _ MID [17:14 ].
Step 8, judging whether the exponential operation result D _ MID [17:14] is positive overflow, if yes, entering step 11.
And 9, judging whether the exponential operation result D _ MID [17:14] is negative overflow, and if so, entering the 12 th step.
Step 10, judging whether the index operation result D _ MID [17:14] is in the table number range, if so, entering step 13.
Step 11, if D _ MID [13:0] >0, setting D _ MID as the maximum positive value of floating point number; if D _ MID [13:0] <0, set D _ MID as the maximum negative value of floating point number, proceed to step 13.
In step 12, set D _ MID [13:0] to 0 and D _ MID [17:14] to-8, and proceed to step 13.
And step 13, converting the 18-bit operation data D _ MID into 16-bit floating-point format data, removing sign extension bits, removing hidden bits, and outputting 16-bit floating-point format data DOUT [15:0] wherein DOUT [15:12] ═ D _ MID [17:14], DOUT [11] ═ D _ MID [12], DOUT [10:0] ═ D _ MID [10:0 ].
The first gating latch and the second gating latch are both 16-bit parallel structures.
The first strobe latch includes 16 strobe latch units having the same structure.
The p-th gating latch unit comprises a selector mux _1, an inverter inv _2 and a D flip-flop dff _22, wherein the input end A of the selector mux _1 is connected with the p-th bit of the calculation result of the floating-point adder, the input end B of the selector mux _1 is connected with the p-th bit of the calculation result of the inverse operator, the output end of the selector mux _1 is connected with the input end D of the D flip-flop dff _22, the input end of the inverter inv _2 and the AS end of the selector mux _1 are simultaneously connected with the data symbol output by the symbol judgment module, and the output end of the inverter inv _2 is connected with the BS end of the selector mux _ 1; the clock input clk of the D flip-flop dff _22 is connected to mcycle [6], the reset input rst of the D flip-flop dff _22 is connected to an externally input reset signal, and the output of the D flip-flop dff _22 outputs the p-th bit output data of the first gated latch, where p is 0,1,2, 3, … … 15. As shown in fig. 6 (a).
The functions are as follows: when the data symbol neg output by the symbol judging module is equal to 0, selecting D1[ p ], latching strobe data when clk is effective and outputting D3[ p ]; when neg is 1, D2[ p ] is selected, the strobe data is latched when clk is active, and D3[ p ] is output.
The second strobe latch includes 16 strobe latch units having the same structure.
The qth gating latch unit comprises a selector mux _2, an inverter inv _2 ' and a D flip-flop dff _23, wherein the input end A of the selector mux _2 is connected with the pth bit of the output result of the lookup table, the input end B of the selector mux _2 is connected with the pth bit of the output result of the first gating latch, the output end of the selector mux _2 is connected with the input end D of the D flip-flop dff _23, the input end of the inverter inv _2 ' and the AS end of the selector mux _2 are simultaneously connected with a configuration control signal M, and the output end of the inverter inv _2 ' is connected with the BS end of the selector mux _ 2; the clock input end clk of the D flip-flop dff _23 is connected with a period control signal of the controller, the reset input end rst of the D flip-flop dff _23 is connected with an externally input reset signal, and the output end of the D flip-flop dff _23 outputs the p-th bit output data of the second gating latch. As shown in fig. 6 (b).
The input signals of the range detection module are: timing control signal mcycle [1]]The reset signal rst and the 16-bit floating point data _ abs are connected; the output signal is a 4-bit range signal range [3:0]]. The functions are as follows: the reset signal is active, the set constant C0 is (5.0000)10The constant C1 is (1.0000)10The constant C2 is (1.0000)10(ii) a The input data is simultaneously compared with C0, C1 and C2 through a floating point comparator; when the input floating point data _ abs is larger than C0Output range [3]]Is 0, range [2]]Is 1, range [1]]Is 1, range [0]]Is 1; when the input floating point data _ abs is larger than C1, outputting range [2]]Is 0, range [3]]Is 1, range [1]]Is 1, range [0]]Is 1; when the input floating point data _ abs is larger than C2, outputting range [1]]Is 0, range [3]]Is 1, range [2]]Is 1, range [0]]Is 1; range [3] if floating point data _ abs is less than C0, C1, C2]The output is 1, range 2]The output is 1, range [1]]The output is 1, range [0]]The output is 0. The range detection module structure is shown in FIG. 7, and comprises a floating-point comparator fcom _1, a floating-point comparator fcom _2, a floating-point comparator fcom _3, an inverter inv _4, an inverter inv _5, a two-input OR gate 21_2, a three-input OR gate 31_0, a three-input OR gate 31_1, a D flip-flop dff _11, a D flip-flop dff _12, a D flip-flop dff _13 and a D flip-flop dff _ 14;
the first input end of the floating-point comparator fcom _1 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C0, C0 is (5.0000)10
The first input end of the floating-point comparator fcom _2 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C1, and C1 is (1.0000)10
The first input end of the floating-point comparator fcom _3 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C2, and C2 is (1.0000)10
The output end of the floating-point comparator fcom _1 is simultaneously connected with the data input end D of the D flip-flop dff _11 and the input end of the inverter inv _3, the output end of the inverter inv _3 is simultaneously connected with the first input end of the two-input OR gate 21_2, the first input end of the three-input OR gate 31_0 and the first input end of the three-input OR gate 31_1, and the output end of the floating-point comparator fcom _2 is simultaneously connected with the input end of the inverter inv _4 and the second input end of the two-input OR gate 21_ 2; the output end of the two-input or gate or21_2 is connected with the data input end D of the D flip-flop dff _ 12; the output terminal of the inverter inv _4 is simultaneously connected to the second input terminal of the three-input or gate 31_0 and the second input terminal of the three-input or gate 31_ 1; the output end of the floating-point comparator fcom _3 is simultaneously connected with the third input end of the three-input OR gate 31_0 and the input end of the inverter inv _5, the output end of the inverter inv _5 is connected with the third input end of the three-input OR gate 31_1, the output end of the three-input OR gate 31_0 is connected with the data input end D of the D flip-flop dff _13, and the output end of the three-input OR gate 31_1 is connected with the data input end D of the D flip-flop dff _ 14;
the reset input ends rst of the D flip-flops dff _11, dff _12, dff _13 and dff _14 are connected with an externally input reset signal rst, mcycle [1] is connected with the clock input ends clk of the D flip-flops dff _11, dff _12, dff _13 and dff _14, the q output end q of the D flip-flop dff _14 outputs a signal range [0], the q output end q of the D flip-flop dff _13 outputs a signal range [1], the q output end q of the D flip-flop dff _12 outputs a signal range [2], and the q output end q of the D flip-flop dff _11 outputs a signal range [3 ].
The floating-point comparator flow diagram (see FIG. 8) is that 5 steps are required to implement a 16-bit data _ abs [15:0] and 1 carry bit Ci [2:0] floating-point comparator.
Step 1, comparing mantissas of the input data _ abs at the first input end and a constant Ci, i at the second input end, where i is 0,1, 2:
adding a mantissa hidden bit according to a floating point data format, and simultaneously expanding a sign bit;
subtracting the mantissa of Ci from the mantissa of data _ abs, and recording the result as dm [13:0 ];
and step 2, comparing the data _ abs input at the first input end with the exponent of the constant Ci at the second input end:
extending a sign bit, and subtracting the exponent of a constant Ci from the exponent of data _ abs to obtain a result which is marked as de [4:0 ];
and 3, carrying out symbol judgment on the mantissa comparison result dm [13:0 ]:
sdm is calculated using the following formula: dm-dm 13 xor dm 12;
when sdm is 0, dm [13:0] is a positive number; when sdm is 1, dm [13:0] is negative;
and 4, performing symbol judgment on the index comparison result de:
sde is calculated using the following formula: sde ═ de [4] xor dm [3 ];
sde is 0, de [4:0] is positive; sde is 1, de 4:0 is negative;
step 5, judging a comparison result, if sde is 0 and de [4:0] is not 0, outputting 0 when the data _ abs of the first input end is larger than the constant Ci of the second input end; if sde is 0, de [4:0] is 0, and sdm is 0, the first input data _ abs is greater than the second input constant Ci, and the output is 0; if sde is 0, de [4:0] is 0, and sdm is 1, the first input data _ abs is smaller than the second input constant Ci, and the output is 1; if sde is 1, the first input data _ abs is smaller than the second input constant Ci and the output is 1.
The input signal of the 32-bit parameter register (see FIG. 2, FIG. 9) with 4 word length is the range interval identification signal range [3:0]]Clock signal clk, reset signal rst, output signal a [15:0] which is a linear phase coefficient of a linear function approximating a sigmoid function]And an offset B [15:0]]. The method has the functions that when rst is effective, the parameter register is set, each word of the 32-bit parameter register with the length of 4 words is respectively set into the first-order coefficient and the offset of 4 groups of first-order functions corresponding to 4 division areas of the independent variable input by the sigmoid function, and the range [3:0] identification signals are used according to the range area]In the order of (A, B)10:(0.0000,0.0000)10;(0.0313,0.8438)10;(0.1250,0.6250)10;(0.2500,0.5000)10. Selecting corresponding 32-bit words according to the range interval identification signal range for approximating parameters of a first order function of the sigmoid activation function, namely a first order coefficient A [15:0]]And offset B [15:0]]And latches the output according to the clock signal. The parameter register comprises four 32-bit tri-state control gates and a 32-bit D trigger dff _15, the control signal end of the jth 32-bit tri-state control gate is connected with the jth bit of the range interval identification signal output by the range detection module, and j is 0,1,2 and 3; the reset signal ends of four 32-bit three-state control gates are all connected with an externally input reset signal rst, the output ends of the four 32-bit three-state control gates are all connected with a data input end D of a 32-bit D trigger dff _15, the reset input end rst of the 32-bit D trigger dff _15 is connected with the externally input reset signal rst, and the clock input end clk of the 32-bit D trigger dff _15 is connected with an external mcycle [2]]32 bit D flip-flop DThe high 16 bits of the q output from the output terminal of ff _15 are first order coefficients A [15:0]]The lower 16 bits are offset B [15:0]]。
The flow chart of the floating-point multiplier (see FIG. 10) is that 5 steps are required to implement the multiplication of two 16-bit floating-point data A [15:0] and data _ abs [15:0 ].
Step 1, operational data preparation, namely converting A [15:0] 16-bit floating-point format data into 18-bit operational data AIN [17:0], wherein the conversion process is as follows:
AIN[17:14]=A[15:12],AIN[13]=A[11],AIN[12]=A[11],AIN[11]=~A[11],AIN[10:0]=A[10:0];
meanwhile, data _ abs [15:0] 16-bit floating-point format data is converted into 18-bit operation data XIN [17:0], and if range [3] is equal to 1, the format of XIN [17:0] is as follows: { XIN [17:14] ═ data _ abs [15:12], XIN [13] ═ data _ abs [11], XIN [12] ═ data _ abs [11], XIN [11] ═ data _ abs [11], XIN [10:0] } data _ abs [10:0 ]; if range [3] is 0, then XIN [17:0] is set to a constant of 1.
Step 2, data multiplication operation, namely multiplying AIN and XIN mantissas to obtain AXIN mantissas: AXIN [27:0] ═ AIN [13:0] × XIN [13:0 ];
adding AIN XIN indexes to obtain AXIN index: AXIN [31:28] ═ AIN [17:14] + XIN [17:14 ].
Step 3, judging a multiplication mantissa operation result AXIN [27:0], and if AXIN [27:14] is equal to 0, then AXIN [31:28] is equal to-8; if AXIN [27:14] > >1 is a normalization number, { AXIN [27:14] } > >1, AXIN [31:28] ═ AXIN [31:28] +1 }; if AXIN [27:14] > >2 is the normalization number, { AXIN [27:14] > >2, AXIN [31:28] ═ AXIN [31:28] +2}, otherwise AXIN [27:14] is the normalization number.
Step 4, judging a multiplication exponent operation result AXIN [31:28], if AXIN [31:28] overflows and AXIN [27:14] >0, then { AXIN [31:28] = 7 }; if AXIN [31:28] overflows and AXIN [27:14] <0, { AXIN [31:28] ═ 8 }; if AXIN [31:28] underflows, { AXIN [31:28] ═ 8, AXIN [27:14] ═ 0}, otherwise AXIN [31:28] is within a range of values.
And 5, latching and outputting the multiplication result, if rst is 1 and clk is active, then { AXIN [31: 12] ═ AXIN [31:28], AXIN [11] ═ AXIN [27], AXIN [10:0] ═ AXIN [25:15] }, and outputting the result: AX [15:0] ═ a [15:0] × data _ abs [15:0 ].
The flow chart of the floating-point adder (see FIG. 11) is to perform the addition of two 16-bit floating-point data AX [15:0] and B [15:0], which requires 7 steps.
Step 1, operation data preparation, namely converting AX [15:0] 16-bit floating-point format data into 18-bit operation data AXIN [17:0] in a manner as follows: AXIN [17:14] ═ AX [15:12], AXIN [13] ═ AX [11], AXIN [12] ═ AX [11], AXIN [11] ═ AXAX [11], AXIN [10:0] ═ AX [10:0 ].
B [15:0] 16-bit floating-point format data is converted into 18-bit operation data BIN [17:0] in the following way: BIN [17:14] ═ B [15:12], BIN [13] ═ B [11], BIN [12] ═ B [11], BIN [11] ═ B [11], and BIN [10:0] ═ B [10:0 ].
And step 2, calculating the exponent difference of two floating-point operation data, namely: n ═ AXIN [17:14] -BIN [17:14 ].
And 3, carrying out mantissa log rank matching on the two floating-point operation data according to the exponent difference, namely: if N >0, { BIN [13:0] > > N YIN [17:14] ═ AXIN [17:14] }; otherwise { AXIN [13:0] ═ AXIN [13:0] > > > | N | YIN [17:14] } BIN [17:14 ].
And 4, adding the mantissas of the two floating-point operation data, namely: YIN ═ AXIN [13:0] + BIN [13:0 ].
Step 5, judging the result YIN [13:0] of mantissa addition, and if YIN [13:0] is equal to 0, then YIN [17:14] is equal to-8; if YIN [13:0] overflows, { YIN [13:0] ═ YIN [13:0] > >1, YIN [17:14] ═ YIN [17:14] +1 }; if YIN [13:0] < < K is a normalized number, { YIN [13:0] < < K, YIN [17:14] <K } is YIN [13:0] <K.
Sixthly, judging the addition result index YIN [17:14], if YIN [17:14] overflows and YIN [13:0] >0, then { YIN [17:14] ═ 7 }; if YIN [17:14] overflows and YIN [13:0] <0, { YIN [17:14] ═ 8 }; if YIN [17:14] underflows, then { YIN [17:14] ═ 8, YIN [13:0] ═ 0}, otherwise YIN [17:14] is within the range.
Seventh, the addition result is latched out, and if rst is 1 and clk is active, Y [15:12] ═ YIN [15:14], Y [11] ═ YIN [13], Y [10:0] ═ YIN [10:0] }, and Y [15:0] ═ AX [15:0] + B [15:0] are output.
The input signals of the address generator (see FIG. 12, FIG. 13) are clock nmcle [1], reset signal rst, 16-bit data _ abs [15:0 ]; the output signal is 17 bits address addr [16:0 ]. The functions are as follows: the address used for producing the look-up table, according to the interval of the value where the data locates, produce the address value of 17 bits; the 16-bit input data _ abs [15:0] is simultaneously compared with the 4-bit address generating modules addrgen1, addrgen2, addrgen3 and addrgen 4; when the input data is 6.5-8 data _ abs [15:0] ≦ 8, one bit of the output addr [3:0] of addrgen1 is 0, and the other address generation module address output is 1; when the input data is 4.5-6 of data _ abs [15:0], one bit of the output adr [3:0] of addrgen2 is 0, and the output of other address generation module address is 1; when the input data is more than or equal to 2.5 and less than or equal to data _ abs [15:0] and less than or equal to 4, one bit of the output addr [3:0] of addrgen3 is 0, and the output of other address generation module address is 1; when the input data is 0.5 ≦ data _ abs [15:0] ≦ 2, one bit of the output adr [3:0] of addrgen4 is 0, and the other address generation module address output is 1. The structure of the address generator is shown in fig. 12: the address generator comprises four 4-bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen 4;
the cin end of addrgen1 is grounded, the cout end of addrgen1 is connected with the cin end of addrgen2, the cout end of addrgen2 is connected with the cin end of addrgen3, and the cout end of addrgen3 is connected with the cin end of addrgen 4; the clk terminals of addrgen1, addrgen2, addrgen3 and addrgen4 are connected with nmcycle [1]](ii) a The rst terminals of addrgen1, addrgen2, addrgen3 and addrgen4 are connected to the externally input reset signals rst, addrgen1, addrgen2, addrgen3 and addrgen4 d [15:0]The end is connected with the data output by the symbol judging module; d3[15:0] of addrgen1]Terminal constant (8)10Corresponding floating point number, d2[15:0]Terminal constant (7.5)10Corresponding floating point number, d1[15:0]Terminal constant (7)10Corresponding floating point number, d0[15:0]Terminal constant (6.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [16:13 ]];
D3[15:0] of Addrgen2]Terminal constant (6)10Corresponding floating point number, d2[15:0]Terminal constant (5.5)10Corresponding floating point number, d1[15:0]Terminal constant (5)10Corresponding floating point number, d0[15:0]Terminal constant (4.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [12:9 ]];
D3[15:0] of Addrgen3]Terminal constant (4)10Corresponding floating point number, d2[15:0]Terminal constant (3.5)10Corresponding floating point number, d1[15:0]Terminal constant (3)10Corresponding floating point number, d0[15:0]Terminal constant (2.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [8: 5]];
D3[15:0] of Addrgen4]Terminal constant (2)10Corresponding floating point number, d2[15:0]Terminal constant (1.5)10Corresponding floating point number, d1[15:0]Terminal constant (1)10Corresponding floating point number, d0[15:0]Terminal constant (0.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [4: 1]]The cout terminal of addrgen4 outputs addr [0]]。
The input signals of the 4-bit address generation module (see FIG. 13) are a clock signal nmcycle [1], 16-bit data _ abs [15:0], a reset signal rst, 16-bit data d0, 16-bit data d1, 16-bit data d2, 16-bit data d3 and a carry bit CIN; the output signals are 4-bit address addr [3:0], carry output cout. The functions are as follows: the 16-bit input data _ abs [15:0] is simultaneously compared with the division value constants d0, d1, d2 and d3 of the four tanh function independent variable value intervals (and d0< d1< d2< d 3); when the input data _ abs [15:0] is not less than d3, addr [3] is 0, the output of other addresses is 1, and the cascade output control cout is 1; when the input data d2 is not more than data _ abs [15:0] < d3, addr [2] is 0, other output addresses are 1, and cascade output control cout is 1; when the input data d1 is not more than data _ abs [15:0] < d2, addr [1] is 0, the other address outputs are 1, and the cascade output control cout is 1; when the input data d0 is not more than data _ abs [15:0] < d1, addr [0] is 0, the other address outputs are 1, and the cascade output control cout is 1; when the cascade input control signal CIN is active, the address addr [3:0] is 1 and the cascade output control cout is 1. The structure of the 4-bit address generation module is shown in fig. 13: the 4-bit address generating modules addrgen1, addrgen2, addrgen3 and addrgen4 are identical in structure and respectively comprise a floating-point comparator fcom _4, a floating-point comparator fcom _5, a floating-point comparator fcom _6, a floating-point comparator fcom _7, an inverter inv _6, an inverter inv _7, an inverter inv _8, an inverter inv _9, a two-input or gate or21_3, a three-input or gate or31_2, a four-input or gate or41_0, a five-input or gate or51_0, a five-input or gate or51_1, a D trigger dff _16, a D trigger dff _17, a D trigger dff _18 and a D trigger dff _ 19;
the floating-point comparator fcom _4, the floating-point comparator fcom _5, the floating-point comparator fcom _6 and the floating-point comparator fcom _7 are provided with two input ends;
the first input end of the floating-point comparator fcom _4 is used as the d [15:0] end of the 4-bit address generation module, and the second input end is used as the d3[15:0] end of the 4-bit address generation module;
the first input end of the floating-point comparator fcom _5 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d2[15:0] end of the 4-bit address generating module;
the first input end of the floating-point comparator fcom _6 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d1[15:0] end of the 4-bit address generating module;
the first input end of the floating-point comparator fcom _7 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d0[15:0] end of the 4-bit address generating module;
a first input end of the two-input or gate 21_3, a first input end of the three-input or gate 31_2, a first input end of the four-input or gate 41_0, a first input end of the five-input or gate 51_0 and a first input end of the five-input or gate 51_1 are connected and then serve as a cin end of the 4-bit address generation module;
the output end of the floating-point comparator fcom _4 is simultaneously connected with the input end of the inverter inv _6 and the second input end of the two-input or gate 21_3, and the output end of the two-input or gate 21_3 is connected with the input end D of the D flip-flop dff _ 16;
the output end of the floating-point comparator fcom _5 is simultaneously connected with the input end of the inverter inv _7 and the third input end of the three-input or gate or31_2, the output end of the inverter inv _6 is connected with the second input end of the three-input or gate or31_2, and the output end of the three-input or gate or31_2 is connected with the input end D of the D flip-flop dff _ 17;
the output end of the floating-point comparator fcom _6 is simultaneously connected with the input end of the inverter inv _8 and the fourth input end of the four-input or gate or41_0, the output end of the inverter inv _6 is connected with the second input end of the four-input or gate or41_0, the output end of the inverter inv _7 is connected with the third input end of the four-input or gate or41_0, and the output end of the four-input or gate or41_0 is connected with the input end D of the D flip-flop dff _ 18;
the output end of the floating-point comparator fcom _7 is simultaneously connected with the input end of the inverter inv _9 and the fifth input end of the five-input or gate or51_0, the output end of the inverter inv _6 is connected with the second input end of the five-input or gate or51_0, the output end of the inverter inv _7 is connected with the third input end of the five-input or gate or51_0, the output end of the inverter inv _8 is connected with the fourth input end of the five-input or gate or51_0, and the output end of the five-input or gate or51_0 is connected with the input end D of the D flip-flop dff _ 19;
the output end of the inverter inv _6 is connected with the second input end of the five-input OR gate 51_1, the output end of the inverter inv _7 is connected with the third input end of the five-input OR gate 51_1, the output end of the inverter inv _8 is connected with the fourth input end of the five-input OR gate 51_1, the output end of the inverter inv _9 is connected with the fifth input end of the five-input OR gate 51_1, and the output end of the five-input OR gate 51_1 serves as the cout end of the 4-bit address generation module;
the reset input end rst of the D flip-flop dff _16, the D flip-flop dff _17, the D flip-flop dff _18 and the D flip-flop dff _19 is connected and then serves as the rst end of the 4-bit address generation module, the clock input end clk is connected and then serves as the clk end of the 4-bit address generation module, the D flip-flop dff _16 outputs the 3 rd bit address, the D flip-flop dff _17 outputs the 2 nd bit address, the D flip-flop dff _18 outputs the 1 st bit address, and the D flip-flop dff _19 outputs the 0 th bit address.
The input signals of the lookup table (refer to FIG. 14) are a reset signal rst and address data addr [16:0]]、nmcycle[2]The symbol neg; the output signal is lut _ data [15:0]]. The functions are as follows: when the reset signal rst is effective, the number of the lookup tables, namely the values of the lookup table entries are the function values corresponding to the tan h function independent variable intervals; the neg and the addr together form an index of an item in the lookup tableAn address; when the index address is valid, the stored value lut _ data [15:0] corresponding to the lookup table entry is output]. The structure of the lookup table is shown in FIG. 14, in which the lookup table mainly comprises 33 lookup table entries, and the gluing logic comprises 33 two-input two-OR gates and a 16-bit D flip-flop; the storage values of the 33 lookup table entries mem _ i (i ═ 32 … 0) from the upper bits of the address to the bottom bits of the address are sequentially set as follows: (1.00000)10,(1.00000)10,(0.99999)10,(0.99999)10,(0.99998)10,(0.99996)10,(0.99909)10,(0.99975)10,(0.99932)10,(0.99817)10,(0.99505)10,(0.98661)10,(0.96402)10,(0.90514)10,(0.76159)10,(0.46211)10,(0)10,(-0.50000)10,(-0.10000)10,(-0.15000)10,(-0.20000)10,(-0.25000)10,(-0.30000)10,(-0.35000)10,(-0.40000)10,(-0.45000)10,(-0.50000)10,(-0.55000)10,(-0.60000)10,(-0.65000)10,(-0.70000)10,(-0.75000)10,(-0.80000)10(ii) a The output terminals of the 33 two-input two-OR gates are respectively connected with the tri-state control terminals, neg and addr [15:0] of 33 lookup table entries mem _ i (i is 32 … 0)]Respectively connected with two input ends of 17 two-input OR gates; neg and! addr [15:0]]Two inputs of 16 two-OR gates are connected respectively. The 33 memory cell outputs are connected together, but only one value can be output at a time, and the output value is connected with the data terminal d, nmcycle [2] of the trigger dff _20]Connected to the clock terminal clk of the flip-flop dff _20, the output q of the flip-flop dff _20 is lut _ data [15:0]]。
The input signals of the inverse operator (refer to fig. 2, 5, 15) are the output data Y [15:0, a period control signal mcycle [5], a reset signal rst; the output signal is NY [15:0 ]. The functions are as follows: the complement of the inverse of the input data is calculated. The structure of the inverse operator is shown in fig. 15, the inverse operator includes a data inverse complement operator DD2 and a D flip-flop dff _21, an input end DIN of the data inverse complement operator DD2 is connected to the result output by the floating-point adder, and an output end DOUT is connected to D of a data input end of the D flip-flop dff _ 21; the clock input end clk of the D flip-flop dff _21 is connected with mcycle [5], the reset input end rst of the D flip-flop dff _21 is connected with an external input reset signal rst, and the output end q of the D flip-flop dff _21 outputs the inverse number of the output result of the floating-point adder. The operation flow of the data phase inverse complement operator DD2 refers to fig. 5.
The invention can realize the operation of two most widely applied sigmoid functions and tanh functions in the field of neural networks by configuring control words, has simple structure, adopts synchronous clock design, is convenient for time sequence check and verification, has small area and low power consumption, is convenient to realize on a chip and enhances the practicability of embedded application; when the neural network activation function is calculated by using the method, the processing flow is simple and easy to control, and the calculation efficiency of the neural network activation function is improved; the configurable implementation device of the neural network excitation function can conveniently expand the address generator module and the lookup table module to meet the requirement of function precision transformation according to the requirement of tanh calculation precision. It is therefore an ideal architecture for embedded neural network processor activation function implementation.
The invention is not described in detail and is within the knowledge of a person skilled in the art.

Claims (14)

1. A configurable neural network activation function implementation device is characterized by comprising a controller, a sign judgment module, a range detection module, a parameter register, a floating-point multiplier, a floating-point adder, an inverse arithmetic unit, an address generator, a lookup table, a first gating latch and a second gating latch;
a controller: generating latching control signals and operation completion signals done required in the whole operation data path of different activation functions according to the value of the configuration control signal M;
a symbol judgment module: receiving input operation data, judging whether the data is positive or negative, and if the data is regular, outputting the data to a range detection module, an address generator and a floating-point multiplier; otherwise, the absolute value of the data is output to the range detection module, the address generator and the floating-point multiplier, and the sign of the data is output to the address generator and the first gating latch;
a range detection module: judging which interval the received data value is in, and sending a range interval identification signal to the parameter register;
a parameter register: storing parameters of a linear function for approximating the sigmoid activation function, namely a linear coefficient and an offset; selecting a linear function approaching the sigmoid activation function according to the range interval identification signal, outputting a linear coefficient of the linear function to a floating point multiplier, and outputting the offset of the linear function to a floating point adder;
a floating-point multiplier: extracting the first-order coefficient of the linear function output by the range detection module from the parameter register module, calculating the product of the first-order coefficient of the linear function and the data value, and outputting the product to the floating-point adder;
a floating-point adder: extracting the offset of a linear function output by the range detection module from the parameter register module, calculating the sum of the product and the offset from the floating-point multiplier, and outputting the obtained result to the inverse arithmetic unit and the first gating latch;
an inverse operator: calculating the inverse number of the output result of the floating-point adder, and outputting the inverse number to the first gating latch;
a first gating latch: when the sign of the input operational data is positive, the calculation result from the floating-point adder is output to the second gating latch in a gating mode, and when the sign of the input operational data is negative, the calculation result from the inverse arithmetic unit is output to the second gating latch in a gating mode;
an address generator: generating a 17-bit address value as a lookup table index according to the interval where the data value output by the symbol judgment module is located;
a lookup table: storing the tanh activation function value corresponding to each data interval, searching the tanh activation function value corresponding to the input operation data according to the index of the lookup table in the address generator, and outputting the tanh activation function value to the second gating latch;
a second gating latch: and according to the value of the configuration control signal M, gating and outputting and latching the operation result of the sigmoid activation function or the operation result of the tanh activation function.
2. The configurable neural network activation function implementation device as claimed in claim 1, wherein M ═ 1 represents sigmoid activation function operation, and the controller generates 8 latch control signals mcycle [7:1] required in the whole operation data path of the sigmoid activation function; m ═ 0 denotes the tanh liveness function operation, and the controller generates 4 latch control signals nmcycle [3:1] required in the entire operation data path of the tanh liveness function.
3. The configurable neural network activation function implementation device as claimed in claim 2, wherein the controller comprises D flip-flop dff _0, D flip-flop dff _1, D flip-flop dff _2, D flip-flop dff _3, D flip-flop dff _4, D flip-flop dff _5, D flip-flop dff _6, D flip-flop dff _7, D flip-flop dff _8, and gate and21_0, and gate and21_1, and gate and21_2, and gate and21_3, and gate and21_4, and gate and21_5, and gate and21_6, and gate and21_7, and gate and21_8, and gate 21_9, and gate 21_10, or gate or21_0, or21_1, two-input nor gate 21_0, inverter inv _0 and inverter inv _ 1;
the clock input ends clk of the D flip-flop dff _0, the D flip-flop dff _1, the D flip-flop dff _2, the D flip-flop dff _3, the D flip-flop dff _4, the D flip-flop dff _5, the D flip-flop dff _6, the D flip-flop dff _7 and the D flip-flop dff _8 are connected with an external input clock signal clk, the data input end D of the D flip-flop dff _0 is connected with '1' and high potential, and the output end q of the D flip-flop dff _0 outputs a period control signal cycle0 on one hand and is connected with the data input end D of the flip-flop dff _1 on the other hand; the output end q of the D flip-flop dff _1 is connected with the data input end D of the D flip-flop dff _ 2; the output end q of the D flip-flop dff _2 is connected with the data input end D of the D flip-flop dff _ 3; the output end q of the D flip-flop dff _3 is connected with the data input end D of the D flip-flop dff _ 4; the output end q of the D flip-flop dff _4 is connected with the data input end D of the D flip-flop dff _ 5; the output end q of the D flip-flop dff _5 is connected with the data input end D of the D flip-flop dff _ 6; the output end q of the D flip-flop dff _6 is connected with the data input end D of the D flip-flop dff _ 7; the output end q of the D trigger dff _7 is connected with the data input end D of the D trigger dff _ 8;
the input end of the inverter inv _0 is connected with a configuration control signal M; the output terminal of the inverter inv _0 is connected to the first input terminal of the and gate 21_0, the first input terminal of the and gate 21_2 and the first input terminal of the and gate 21_4 simultaneously, the output terminal q of the D flip-flop dff _1 is connected to the second input terminal of the and gate 21_0 and the first input terminal of the and gate 21_1, the configuration control signal M is connected to the second input terminal of the and gate 21_1, the second input terminal of the and gate 21_3, the second input terminal of the and gate 21_5, the second input terminal of the and gate 21_6, the second input terminal of the and gate 21_7, the second input terminal of the and gate 21_8, the second input terminal of the and gate 21_9 and the second input terminal of the and21_10 simultaneously, the output terminal q of the D flip-flop dff _2 is connected to the second input terminal q of the and gate 21_2 and gate and the first input terminal of the and gate 21_3, the output terminal q of the D flip-flop dff _3 is connected to the second input terminal 21 and21, an output terminal q of the D flip-flop dff _4 is connected with a first input terminal of an AND gate 21_6, an output terminal q of the D flip-flop dff _5 is connected with a first input terminal of an AND gate 21_7, an output terminal q of the D flip-flop dff _6 is connected with a first input terminal of an AND gate 21_8, an output terminal q of the D flip-flop dff _7 is connected with a first input terminal of an AND gate 21_9, and an output terminal q of the D flip-flop dff _8 is connected with a first input terminal of an AND gate 21_ 10;
the output end of the AND gate 21_0 outputs a signal nmcycle [1], the output end of the AND gate 21_1 outputs a signal mccycle [1], the output end of the AND gate 21_2 outputs a signal nmcycle [2], the output end of the AND gate 21_3 outputs a signal mccycle [2], the output end of the AND gate 21_4 outputs a signal nmcycle [3], the output end of the AND gate 21_5 outputs a signal mccycle [3], the output end of the AND gate 21_6 outputs a signal mccycle [4], the output end of the AND gate 21_7 outputs a signal mccycle [5], the output end of the AND gate 21_8 outputs a signal mccycle [6], and the output end of the AND gate 21_9 outputs a signal mccycle [7 ];
the output end of the and gate 21_6 is connected with the first input end of the or gate 21_1, the output end of the and gate 21_10 is connected with the second input end of the or gate 21_1, the output end of the or gate 21_1 outputs the done signal done on one hand, and is connected with the first input end of the two-input nor gate nor21_0 on the other hand, the input end of the inverter inv _1 is connected with an externally input reset signal rst, the output end of the inverter inv _1 is connected with the second input end of the two-input nor gate nor21_0, and the output end of the two-input nor gate 21_0 is simultaneously connected with the reset input ends rst of the D flip-flops dff _0, dff _1, dff _2, dff _3, dff _4, dff _5, dff _6, dff _7 and dff _ 8; the output terminal of the and gate and21_5 is connected to the first input terminal of the or gate or21_0, the output terminal of the and gate and21_9 is connected to the second input terminal of the or gate or21_0, and the output terminal of the or gate or21_0 outputs the signal result _ clk.
4. The configurable neural network activation function implementation device as claimed in claim 2, wherein the sign determination module comprises a data phase inverse complement operator DD1, a 16-bit data latch and a D flip-flop dff _ 9;
the input end DIN of the data phase inverse complement arithmetic device DD1 and the input end d2 of the 16-bit data latch are simultaneously connected with the input operation data, and the output end DOUT of the data phase inverse complement arithmetic device DD1 is connected with the input end d1 of the 16-bit data latch;
the 11 th bit data [11] of the input operational data is simultaneously connected with the control input end ctrl of the 16-bit data latch and the data input end D of the D flip-flop dff _ 9;
the cycle control signal cycle0 is connected to the clock input clk of the 16-bit data latch and the clock input clk of the D flip-flop dff _9 at the same time; an externally input reset signal rst is simultaneously connected with a reset input end rst of the 16-bit data latch and a reset input end rst of the D trigger dff _ 9;
the output D3 of the 16-bit data latch outputs the absolute value data _ abs [15:0] of the input operational data, and the output q of the D flip-flop dff _9 outputs the sign neg of the input operational data.
5. The configurable neural network activation function implementation device as claimed in claim 2, wherein the inverse operator comprises a data inverse complement operator DD2 and a D flip-flop dff _21,
the input end DIN of the data phase inversion complement arithmetic device DD2 is connected with the result output by the floating-point adder, and the output end DOUT is connected with D of the data input end of the D flip-flop dff _ 21;
the clock input end clk of the D flip-flop dff _21 is connected with mcycle [5], the reset input end rst of the D flip-flop dff _21 is connected with an external input reset signal rst, and the output end q of the D flip-flop dff _21 outputs the inverse number of the output result of the floating-point adder.
6. The device as claimed in claim 4 or5, wherein the data inverse complement operator DD1 and the data inverse complement operator DD2 have the same implementation procedure, and specifically include the following steps:
step 1, recording data received by an input end DIN as DIN [15:0], calculating a negative complement D _ MID [13:0] of the DIN [15:0] mantissa, simultaneously performing step 2, step 3 and step 4, and judging the D _ MID [13:0 ];
step 2, judging whether the D _ MID [13:0] is 0, if so, entering the step 5;
step 3, judging whether D _ MID [13:0] overflows or not, if D _ MID [13] xorD _ MID [12] is 1, the D _ MID [13] xorD _ MID ] overflows, and if the D _ MID [12] xorD _ MID ] overflows, entering step 6;
step 4, if D _ MID [13] xorD _ MID [12] is 0, then step 7 is entered;
step 5, setting the D _ MID [17:14] as the maximum negative value to obtain a correct representation method of 0, and simultaneously performing the step 8, the step 9 and the step 10 to judge the D _ MID [17:14 ];
step 6, shifting the D _ MID [13:0] by one bit to the right, adding 1 to the index D _ MID [17:14] DIN [15:12], and simultaneously performing steps 8, 9 and 10 to judge the D _ MID [17:14 ];
step 7, shifting the D _ MID [13:0] by K bits to the left, decreasing the index D _ MID [17:14] to DIN [15:12] by K, simultaneously performing the steps 8, 9 and 10, and judging the D _ MID [17:14 ];
step 8, judging whether the index operation result D _ MID [17:14] overflows in the positive direction, if so, entering the step 11;
step 9, judging whether the exponential operation result D _ MID [17:14] overflows in a negative direction or not, and if yes, entering the step 12;
step 10, judging whether the index operation result D _ MID [17:14] is in the range, if so, entering step 13;
step 11, if D _ MID [13:0] >0, setting D _ MID as the maximum positive value of floating point number; if D _ MID [13:0] <0, setting D _ MID as the maximum negative value of floating point number, and entering step 13;
step 12, setting D _ MID [13:0] as 0 and D _ MID [17:14] as-8, and entering step 13;
and step 13, removing sign extension bits from the 18-bit operation data D _ MID and removing hidden bits to obtain 16-bit floating point format data DOUT output by the DOUT section [15:0] wherein DOUT [15:12] ═ D _ MID [17:14], DOUT [11] ═ D _ MID [12], DOUT [10:0] ═ D _ MID [10:0 ].
7. The configurable neural network activation function implementation device as claimed in claim 2, wherein the range detection module comprises a floating point comparator fcom _1, a floating point comparator fcom _2, a floating point comparator fcom _3, an inverter inv _4, an inverter inv _5, a two-input or gate or21_2, a three-input or gate or31_0, a three-input or gate or31_1, a D flip-flop dff _11, a D flip-flop dff _12, a D flip-flop dff _13 and a D flip-flop dff _ 14;
the first input end of the floating-point comparator fcom _1 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C0, C0 is (5.0000)10
The first input end of the floating-point comparator fcom _2 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C1, and C1 is (1.0000)10
The first input end of the floating-point comparator fcom _3 is connected with the data _ abs [15:0]The second input terminal is connected with a constant C2, and C2 is (1.0000)10
The output end of the floating-point comparator fcom _1 is simultaneously connected with the data input end D of the D flip-flop dff _11 and the input end of the inverter inv _3, the output end of the inverter inv _3 is simultaneously connected with the first input end of the two-input OR gate 21_2, the first input end of the three-input OR gate 31_0 and the first input end of the three-input OR gate 31_1, and the output end of the floating-point comparator fcom _2 is simultaneously connected with the input end of the inverter inv _4 and the second input end of the two-input OR gate 21_ 2; the output end of the two-input or gate or21_2 is connected with the data input end D of the D flip-flop dff _ 12; the output terminal of the inverter inv _4 is simultaneously connected to the second input terminal of the three-input or gate 31_0 and the second input terminal of the three-input or gate 31_ 1; the output end of the floating-point comparator fcom _3 is simultaneously connected with the third input end of the three-input OR gate 31_0 and the input end of the inverter inv _5, the output end of the inverter inv _5 is connected with the third input end of the three-input OR gate 31_1, the output end of the three-input OR gate 31_0 is connected with the data input end D of the D flip-flop dff _13, and the output end of the three-input OR gate 31_1 is connected with the data input end D of the D flip-flop dff _ 14;
the reset input ends rst of the D flip-flops dff _11, dff _12, dff _13 and dff _14 are connected with an externally input reset signal rst, mcycle [1] is connected with the clock input ends clk of the D flip-flops dff _11, dff _12, dff _13 and dff _14, the q output end q of the D flip-flop dff _14 outputs a signal range [0], the q output end q of the D flip-flop dff _13 outputs a signal range [1], the q output end q of the D flip-flop dff _12 outputs a signal range [2], and the q output end q of the D flip-flop dff _11 outputs a signal range [3 ].
8. The configurable neural network activation function implementation device of claim 2, wherein the address generator comprises four 4-bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen 4;
the cin end of addrgen1 is grounded, the cout end of addrgen1 is connected with the cin end of addrgen2, the cout end of addrgen2 is connected with the cin end of addrgen3, and the cout end of addrgen3 is connected with the cin end of addrgen 4; the clk terminals of addrgen1, addrgen2, addrgen3 and addrgen4 are connected with nmcycle [1]](ii) a The rst terminals of addrgen1, addrgen2, addrgen3 and addrgen4 are connected to the externally input reset signals rst, addrgen1, addrgen2, addrgen3 and addrgen4 d [15:0]The end is connected with the data output by the symbol judging module; d3[15:0] of addrgen1]Terminal constant (8)10Corresponding floating point number, d2[15:0]Terminal constant (7.5)10Corresponding floating point number, d1[15:0]Terminal constant (7)10Corresponding floating point number, d015:0]Terminal constant (6.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [16:13 ]];
D3[15:0] of Addrgen2]Terminal constant (6)10Corresponding floating point number, d2[15:0]Terminal constant (5.5)10Corresponding floating point number, d1[15:0]Terminal constant (5)10Corresponding floating point number, d0[15:0]Terminal constant (4.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [12:9 ]];
D3[15:0] of Addrgen3]Terminal constant (4)10Corresponding floating point number, d2[15:0]Terminal constant (3.5)10Corresponding floating point number, d1[15:0]Terminal constant (3)10Corresponding floating point number, d0[15:0]Terminal constant (2.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [8: 5]];
D3[15:0] of Addrgen4]Terminal constant (2)10Corresponding floating point number, d2[15:0]Terminal constant (1.5)10Corresponding floating point number, d1[15:0]Terminal constant (1)10Corresponding floating point number, d0[15:0]Terminal constant (0.5)10The corresponding floating point number, addrgen1, has output out [3:0]]Output address addr [4: 1]]The cout terminal of addrgen4 outputs addr [0]]。
9. The device as claimed in claim 8, wherein the 4-bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen4 are identical in structure and each include a floating-point comparator fcom _4, a floating-point comparator fcom _5, a floating-point comparator fcom _6, a floating-point comparator fcom _7, an inverter inv _6, an inverter inv _7, an inverter inv _8, an inverter inv _9, a two-input or gate or21_3, a three-input or gate or31_2, a four-input or gate or41_0, a five-input or gate or51_0, a five-input or gate or51_1, a D flip-flop dff _16, a D flip-flop dff _17, a D flip-flop dff _18 and a D flip-flop dff _ 19;
the floating-point comparator fcom _4, the floating-point comparator fcom _5, the floating-point comparator fcom _6 and the floating-point comparator fcom _7 are provided with two input ends;
the first input end of the floating-point comparator fcom _4 is the d [15:0] end of the 4-bit address generation module, and the second input end is the d3[15:0] end of the 4-bit address generation module;
the first input end of the floating-point comparator fcom _5 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d2[15:0] end of the 4-bit address generating module;
the first input end of the floating-point comparator fcom _6 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d1[15:0] end of the 4-bit address generating module;
the first input end of the floating-point comparator fcom _7 is connected with the first input end of the floating-point comparator fcom _4, and the second input end is used as the d0[15:0] end of the 4-bit address generating module;
a first input end of the two-input or gate 21_3, a first input end of the three-input or gate 31_2, a first input end of the four-input or gate 41_0, a first input end of the five-input or gate 51_0 and a first input end of the five-input or gate 51_1 are connected and then serve as a cin end of the 4-bit address generation module;
the output end of the floating-point comparator fcom _4 is simultaneously connected with the input end of the inverter inv _6 and the second input end of the two-input or gate 21_3, and the output end of the two-input or gate 21_3 is connected with the input end D of the D flip-flop dff _ 16;
the output end of the floating-point comparator fcom _5 is simultaneously connected with the input end of the inverter inv _7 and the third input end of the three-input or gate or31_2, the output end of the inverter inv _6 is connected with the second input end of the three-input or gate or31_2, and the output end of the three-input or gate or31_2 is connected with the input end D of the D flip-flop dff _ 17;
the output end of the floating-point comparator fcom _6 is simultaneously connected with the input end of the inverter inv _8 and the fourth input end of the four-input or gate or41_0, the output end of the inverter inv _6 is connected with the second input end of the four-input or gate or41_0, the output end of the inverter inv _7 is connected with the third input end of the four-input or gate or41_0, and the output end of the four-input or gate or41_0 is connected with the input end D of the D flip-flop dff _ 18;
the output end of the floating-point comparator fcom _7 is simultaneously connected with the input end of the inverter inv _9 and the fifth input end of the five-input or gate or51_0, the output end of the inverter inv _6 is connected with the second input end of the five-input or gate or51_0, the output end of the inverter inv _7 is connected with the third input end of the five-input or gate or51_0, the output end of the inverter inv _8 is connected with the fourth input end of the five-input or gate or51_0, and the output end of the five-input or gate or51_0 is connected with the input end D of the D flip-flop dff _ 19;
the output end of the inverter inv _6 is connected with the second input end of the five-input OR gate 51_1, the output end of the inverter inv _7 is connected with the third input end of the five-input OR gate 51_1, the output end of the inverter inv _8 is connected with the fourth input end of the five-input OR gate 51_1, the output end of the inverter inv _9 is connected with the fifth input end of the five-input OR gate 51_1, and the output end of the five-input OR gate 51_1 serves as the cout end of the 4-bit address generation module;
the reset input end rst of the D flip-flop dff _16, the D flip-flop dff _17, the D flip-flop dff _18 and the D flip-flop dff _19 is connected and then serves as the rst end of the 4-bit address generation module, the clock input end clk is connected and then serves as the clk end of the 4-bit address generation module, the D flip-flop dff _16 outputs the 3 rd bit address, the D flip-flop dff _17 outputs the 2 nd bit address, the D flip-flop dff _18 outputs the 1 st bit address, and the D flip-flop dff _19 outputs the 0 th bit address.
10. A configurable neural network activation function implementing device as claimed in claim 7 or 9, wherein the floating point comparator fcom _1, the floating point comparator fcom _2, the floating point comparator fcom _3, the floating point comparator fcom _4, the floating point comparator fcom _5, the floating point comparator fcom _6, and the floating point comparator fcom _7 have the same workflow, and each floating point comparator has the following specific workflow:
step 1, comparing the input data of the first input end with the constant of the second input end:
adding a mantissa hidden bit according to a floating point data format, and simultaneously expanding a sign bit;
subtracting the mantissa of the constant at the second input terminal from the mantissa of the input data at the first input terminal, and recording the result as dm [13:0 ];
step 2, comparing the input data of the first input end with the index of the constant of the second input end:
extending a sign bit, and subtracting the exponent of the constant of the second input end from the exponent of the input data of the first input end to obtain a result which is marked as de [4:0 ];
and 3, carrying out symbol judgment on the mantissa comparison result dm [13:0 ]:
sdm is calculated using the following formula: dm-dm 13 xor dm 12;
when sdm is 0, dm [13:0] is a positive number; when sdm is 1, dm [13:0] is negative;
and 4, performing symbol judgment on the index comparison result de:
sde is calculated using the following formula: sde ═ de [4] xor dm [3 ];
sde is 0, de [4:0] is positive; sde is 1, de 4:0 is negative;
step 5, judging the comparison result, if sde is 0 and de [4:0] is not 0, the input data of the first input end is larger than the constant of the second input end, and the output is 0; if sde is 0, de [4:0] is 0, and sdm is 0, the input data at the first input is greater than the constant at the second input, and the output is 0; if sde is 0, de [4:0] is 0, and sdm is 1, the input data at the first input terminal is less than the constant at the second input terminal, and the output is 1; if sde is 1, the first input data is less than the second input constant and the output is 1.
11. The configurable neural network activation function implementation device as claimed in claim 2, wherein the parameter register includes four 32-bit tri-state control gates and a 32-bit D flip-flop dff _15, a control signal end of the jth 32-bit tri-state control gate is connected to the jth bit of the range interval identification signal output by the range detection module, j is 0,1,2, 3; the reset signal ends of four 32-bit three-state control gates are all connected with an externally input reset signal rst, the output ends of the four 32-bit three-state control gates are all connected with a data input end D of a 32-bit D trigger dff _15, the reset input end rst of the 32-bit D trigger dff _15 is connected with the externally input reset signal rst, a clock input end clk of the 32-bit D trigger dff _15 is connected with an external mcle [2], the high 16 bits output by an output end q of the 32-bit D trigger dff _15 are first-order coefficients A [15:0], and the low 16 bits are offsets B [15:0 ].
12. The apparatus of claim 2, wherein the look-up table comprises 33 16-bit tri-state control gates, 33 two-input or gates, 16 inverters and 1 16-bit D flip-flop dff _ 20;
the second input ends of the first 17 two-input OR gates are all connected with the symbol output by the symbol judging module, and in the first 17 two-input OR gates, the kth input end is connected with the symbol output by the symbol judging module1A first input terminal of the two-input OR gate is connected with a k-th bit address value, k, generated by the address generator1=0,1,2,……16;
The last 16 two-input OR gates correspond to the 16 phase inverters one by one, the second input ends of the last 16 two-input OR gates are all connected with the output ends of the corresponding phase inverters, the input ends of the 16 phase inverters are all connected with the symbols output by the symbol judging module, and in the last 16 two-input OR gates, the kth input end is connected with the symbol output by the symbol judging module2A first input terminal of the two-input OR gate is connected with the kth generated by the address generator2-16 bit address value, k2=17,18,19,……32;
The output end of each two-input OR gate is connected with the second input end of the corresponding 16-bit tri-state control gate, the first input ends of 33 16-bit tri-state control gates and the reset input end rst of the 16-bit D flip-flop dff _20 are connected with an external input reset signal rst, the output ends of 33 16-bit tri-state control gates are connected together and then connected with the data input end D of the 16-bit D flip-flop dff _20, the clock input end clk of the 16-bit D flip-flop dff _20 is connected with nmcycle [2], and the output end q of the 16-bit D flip-flop dff _20 outputs the tanh activation function value lut _ data [15:0] corresponding to the input operational data.
13. The configurable neural network activation function implementation device of claim 2, wherein the first gating latch comprises 16 identical-structure gating latch units;
the p-th gating latch unit comprises a selector mux _1, an inverter inv _2 and a D flip-flop dff _22, wherein the input end A of the selector mux _1 is connected with the p-th bit of the calculation result of the floating-point adder, the input end B of the selector mux _1 is connected with the p-th bit of the calculation result of the inverse operator, the output end of the selector mux _1 is connected with the input end D of the D flip-flop dff _22, the input end of the inverter inv _2 and the AS end of the selector mux _1 are simultaneously connected with the data symbol output by the symbol judgment module, and the output end of the inverter inv _2 is connected with the BS end of the selector mux _ 1; the clock input clk of the D flip-flop dff _22 is connected to mcycle [6], the reset input rst of the D flip-flop dff _22 is connected to an externally input reset signal, and the output of the D flip-flop dff _22 outputs the p-th bit output data of the first gated latch, where p is 0,1,2, 3, … … 15.
14. The apparatus of claim 2, wherein the second gating latch comprises 16 identical gating latch units;
the qth gating latch unit comprises a selector mux _2, an inverter inv _2 ' and a D flip-flop dff _23, wherein the input end A of the selector mux _2 is connected with the pth bit of the output result of the lookup table, the input end B of the selector mux _2 is connected with the pth bit of the output result of the first gating latch, the output end of the selector mux _2 is connected with the input end D of the D flip-flop dff _23, the input end of the inverter inv _2 ' and the AS end of the selector mux _2 are simultaneously connected with a configuration control signal M, and the output end of the inverter inv _2 ' is connected with the BS end of the selector mux _ 2; the clock input end clk of the D flip-flop dff _23 is connected with a period control signal of the controller, the reset input end rst of the D flip-flop dff _23 is connected with an externally input reset signal, and the output end of the D flip-flop dff _23 outputs the p-th bit output data of the second gating latch.
CN201910041332.7A 2019-01-16 2019-01-16 Configurable neural network activation function implementation device Active CN109816105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910041332.7A CN109816105B (en) 2019-01-16 2019-01-16 Configurable neural network activation function implementation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910041332.7A CN109816105B (en) 2019-01-16 2019-01-16 Configurable neural network activation function implementation device

Publications (2)

Publication Number Publication Date
CN109816105A CN109816105A (en) 2019-05-28
CN109816105B true CN109816105B (en) 2021-02-23

Family

ID=66604394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910041332.7A Active CN109816105B (en) 2019-01-16 2019-01-16 Configurable neural network activation function implementation device

Country Status (1)

Country Link
CN (1) CN109816105B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414677B (en) * 2019-07-11 2021-09-03 东南大学 Memory computing circuit suitable for full-connection binarization neural network
CN110610235B (en) * 2019-08-22 2022-05-13 北京时代民芯科技有限公司 Neural network activation function calculation circuit
US11327923B2 (en) * 2019-09-04 2022-05-10 SambaNova Systems, Inc. Sigmoid function in hardware and a reconfigurable data processor including same
CN111047007B (en) * 2019-11-06 2021-07-30 北京中科胜芯科技有限公司 Activation function calculation unit for quantized LSTM
CN112256094A (en) * 2020-11-13 2021-01-22 广东博通科技服务有限公司 Deep learning-based activation function device and use method thereof
CN112734023B (en) * 2021-02-02 2023-10-13 中国科学院半导体研究所 Reconfigurable circuit applied to activation function of cyclic neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201331855A (en) * 2012-01-19 2013-08-01 Univ Nat Taipei Technology High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
CN107003989A (en) * 2014-12-19 2017-08-01 英特尔公司 For the distribution and the method and apparatus of Collaboration computing in artificial neural network
EP3343463A1 (en) * 2016-12-31 2018-07-04 VIA Alliance Semiconductor Co., Ltd. Neural network unit with re-shapeable memory
CN108564169A (en) * 2017-04-11 2018-09-21 上海兆芯集成电路有限公司 Hardware processing element, neural network unit and computer usable medium
KR20180120009A (en) * 2017-04-26 2018-11-05 광주과학기술원 A stochastic implementation method of an activation function for an artificial neural network and a system including the same

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020066046A1 (en) * 2000-10-24 2002-05-30 Chin-Shuing Liu Apparatus for directly connecting to the internet and method thereof
US10379772B2 (en) * 2016-03-16 2019-08-13 Micron Technology, Inc. Apparatuses and methods for operations using compressed and decompressed data
CN108781265B (en) * 2016-03-30 2020-11-03 株式会社尼康 Feature extraction element, feature extraction system, and determination device
CN105987775A (en) * 2016-07-20 2016-10-05 天津理工大学中环信息学院 Temperature sensor nonlinearity correction method and system based on BP neural network
CN107844439B (en) * 2016-09-20 2020-09-08 三星电子株式会社 Memory device and system supporting command bus training and method of operating the same

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201331855A (en) * 2012-01-19 2013-08-01 Univ Nat Taipei Technology High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes
CN107003989A (en) * 2014-12-19 2017-08-01 英特尔公司 For the distribution and the method and apparatus of Collaboration computing in artificial neural network
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
EP3343463A1 (en) * 2016-12-31 2018-07-04 VIA Alliance Semiconductor Co., Ltd. Neural network unit with re-shapeable memory
CN108564169A (en) * 2017-04-11 2018-09-21 上海兆芯集成电路有限公司 Hardware processing element, neural network unit and computer usable medium
KR20180120009A (en) * 2017-04-26 2018-11-05 광주과학기술원 A stochastic implementation method of an activation function for an artificial neural network and a system including the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A digital circuit design of hyperbolic tangent sigmoid function for neural networks》;Che-Wei Lin,et al;《2008 IEEE International Symposium on Circuits and Systems》;20081231;全文 *
《面向神经网络加速器的近似加法器的电路设计》;吴成均等;《航空科学技术》;20181115;第29卷(第11期);全文 *

Also Published As

Publication number Publication date
CN109816105A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
CN109816105B (en) Configurable neural network activation function implementation device
Wang et al. A high-speed and low-complexity architecture for softmax function in deep learning
Carmichael et al. Performance-efficiency trade-off of low-precision numerical formats in deep neural networks
Lin et al. A digital circuit design of hyperbolic tangent sigmoid function for neural networks
JPH07248841A (en) Nonlinear function generator and format converter
Qin et al. A novel approximation methodology and its efficient vlsi implementation for the sigmoid function
CN108255777B (en) Embedded floating point type DSP hard core structure for FPGA
Loni et al. Tas: ternarized neural architecture search for resource-constrained edge devices
CN109325590B (en) Device for realizing neural network processor with variable calculation precision
Trinh et al. Efficient data encoding for convolutional neural network application
CN111047007B (en) Activation function calculation unit for quantized LSTM
Zarandi et al. An Efficient Component for Designing Signed Reverse Converters for a Class of RNS Moduli Sets of Composite Form $\{2^{k}, 2^{P}-1\} $
Jain et al. Design of radix-4, 16, 32 approx booth multiplier using error tolerant application
Basha et al. Design of CMOS full subtractor using 10T for object detection application
Thangavel et al. Intrinsic evolution of truncated Puiseux series on a mixed-signal field-programmable soc
Daud et al. Hybrid modified booth encoded algorithm-carry save adder fast multiplier
CN111666063B (en) Function increasing implementation device based on random calculation
CN107657078B (en) Ultrasonic phased array floating point focusing transmission implementation method based on FPGA
Hacene et al. Efficient hardware implementation of incremental learning and inference on chip
CN113157247B (en) Reconfigurable integer-floating point multiplier
US20210383264A1 (en) Method and Architecture for Fuzzy-Logic Using Unary Processing
Neelima et al. FIR Filter design using Urdhva Triyagbhyam based on Truncated Wallace and Dadda Multiplier as Basic Multiplication Unit
Chong et al. Efficient implementation of activation functions for lstm accelerators
Madenda et al. New Approach of Signed Binary Numbers Multiplication and Its Implementation on FPGA
Pogiri et al. Design and FPGA Implementation of the LUT based Sigmoid Function for DNN Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant