CN109816105A - A kind of configurable neural network activation primitive realization device - Google Patents
A kind of configurable neural network activation primitive realization device Download PDFInfo
- Publication number
- CN109816105A CN109816105A CN201910041332.7A CN201910041332A CN109816105A CN 109816105 A CN109816105 A CN 109816105A CN 201910041332 A CN201910041332 A CN 201910041332A CN 109816105 A CN109816105 A CN 109816105A
- Authority
- CN
- China
- Prior art keywords
- input
- door
- type flip
- flip flop
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Logic Circuits (AREA)
Abstract
The invention discloses a kind of configurable neural network activation primitive realization devices, including controller, symbol decision module, range detection module, parameter register, floating-point multiplier, floating-point adder, opposite number arithmetic unit, address generator, look-up table, the first gated latches and the second gated latches.The operation of sigmoid function and tanh function is realized by configuration control signal M, device realizes that structure is simple, designs using synchronised clock that, convenient for timing inspection and verifying, area is small, it is low in energy consumption, convenient for realizing on chip, enhance the practicability of Embedded Application;When calculating neural network activation primitive using the present invention, process flow is simple, easily controllable, improves the efficiency of neural network activation primitive calculating;The device can be according to tanh computational accuracy needs, and easily extended address generator block and look-up table means meet the needs of function precision transformation.Therefore the present invention is the ideal structure that embedded neural network processor activation primitive is realized.
Description
Technical field
The present invention relates to a kind of configurable neural network activation primitive realization devices, belong to field of computer technology.
Background technique
Artificial intelligence technology application development is swift and violent in recent years.Various types of nerves especially based on machine learning
Network structure achieves significant progress in application fields such as data mining, classification, image speech recognitions.By specific intelligence application
The neural network hardware that training is completed is in break out what formula increased to improve the demand of built-in applied system intelligent processing capacity
Situation.Activation primitive is the core of neuronal function in neural network.How efficient, succinct realization activation primitive has become mind
One of key technical problem through network built-in type application.Activation primitive, which belongs to, to be surmounted function, the implementation of particular hardware
It is the core technology of external block.
Summary of the invention
Technology of the invention solves the problems, such as: a kind of configurable neural network activation primitive realization device is provided, it is real
Existing structure is simple, area is small, low in energy consumption, convenient for being realized on chip, when calculating neural network activation primitive using the present invention, and place
It is simple to manage process, it is easily controllable, the efficiency of neural network activation primitive calculating is improved, the practical of Embedded Application is enhanced
Property.
The technical solution of the invention is as follows:
A kind of configurable neural network activation primitive realization device, including controller, symbol decision module, range detection
Module, parameter register, floating-point multiplier, floating-point adder, opposite number arithmetic unit, address generator, look-up table, the first gating
Latch and the second gated latches;
Controller: it is generated according to the value of configuration control signal M and to be needed in the different entire operational data paths of activation primitive
Signal done is completed in latch control signal and operation;
Symbol decision module: receiving the operational data of input, judges the positive and negative of the data, if it is canonical that the data are defeated
Out to range detection module, address generator and floating-point multiplier;Otherwise the absolute value of the data is exported and gives range detection mould
Block, address generator and floating-point multiplier, while the symbol of the data being exported to address generator and the first gated latches;
Range detection module: judging which section received data value is in, and marks to parameter register range of transmission section
Know signal;
Parameter register: storage be used to approach sigmoid activation primitive linear function parameter, i.e., Monomial coefficient and
Offset;The linear function that sigmoid activation primitive is approached according to the selection of range intervals id signal, exports to floating-point multiplier
The Monomial coefficient of the linear function exports the offset of the linear function to floating-point adder;
Floating-point multiplier a: term system of the linear function that extraction scope detecting module exports from parameter register module
Number, and the Monomial coefficient of the linear function and the product of data value are calculated, it exports to floating-point adder;
Floating-point adder: the offset for the linear function that extraction scope detecting module exports from parameter register module,
The sum of product and offset from floating-point multiplier are calculated, and obtained result is exported and is gated to opposite arithmetic unit and first
Latch;
Opposite arithmetic unit: the opposite number of floating-point adder output result is calculated, is exported to the first gated latches;
First gated latches: when the symbol of input operational data is timing, by the calculated result from floating-point adder
Gating output is to the second gated latches, when the symbol for inputting operational data is negative, by the calculating knot from opposite arithmetic unit
Fruit gating output is to the second gated latches;
Address generator: section locating for the data value exported according to symbol decision module generates 17 bit address values, as
Look-up table index;
Look-up table: tanh activation primitive value corresponding to each data interval is stored, according to the lookup in address generator
Table index searches the corresponding tanh activation primitive value of input operational data, exports to the second gated latches;
Second gated latches: according to the value of configuration control signal M, gating exports and latches sigmoid activation primitive
The operation result of operation result or tanh activation primitive.
M=1 indicates the operation of sigmoid activation primitive, and controller generates the entire operational data path of sigmoid activation primitive
8 latch control signal mcycle [7:1] of middle needs;M=0 indicates the operation of tanh activation primitive, and controller generates tanh and swashs
The 4 latch control signal nmcycle [3:1] needed in the entire operational data path of function living.
The controller includes d type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, D touching
Send out device dff_4, d type flip flop dff_5, d type flip flop dff_6, d type flip flop dff_7, d type flip flop dff_8, with door and21_0, with
Door and21_1, with door and21_2, with door and21_3, with door and21_4, with door and21_5, with door and21_6 and door
And21_7, nor gates are inputted with door and21_8, with door and21_9, with door and21_10 or door or21_0, or21_1, two
Nor21_0, reverser inv_0 and reverser inv_1;
D type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop dff_4, D touching
Hair device dff_5, d type flip flop dff_6, d type flip flop dff_7 connect external input with the input end of clock clk of d type flip flop dff_8
The data input pin d connection " 1 " of clock signal clk, d type flip flop dff_0 and high potential, the output end q mono- of d type flip flop dff_0
Aspect exports cycle control signal cycle0, on the other hand connects the data input pin d of trigger dff_1;D type flip flop dff_1
Output end q connection d type flip flop dff_2 data input pin d;The output end q connection d type flip flop dff_3 of d type flip flop dff_2
Data input pin d;The data input pin d of the output end q connection d type flip flop dff_4 of d type flip flop dff_3;D type flip flop dff_4
Output end q connection d type flip flop dff_5 data input pin d;The output end q connection d type flip flop dff_6 of d type flip flop dff_5
Data input pin d;The data input pin d of the output end q connection d type flip flop dff_7 of d type flip flop dff_6;D type flip flop dff_7
Output end q connection d type flip flop dff_8 data input pin d;
The input terminal of reverser inv_0 connects configuration control signal M;The output end of reverser inv_0 connects simultaneously and door
The first input end of and21_0, the first input end with the first input end of door and21_2 and with door and21_4, d type flip flop
The output end q of dff_1 connects the first input end with the second input terminal of door and21_0 and with door and21_1, configuration control simultaneously
Signal M processed is connected simultaneously with the second input terminal of the second input terminal of door and21_1 and door and21_3, with door and21_5's
Second input terminal, second with the second input terminal of the second input terminal of door and21_6 and door and21_7 and door and21_8
Input terminal, the second input terminal with the second input terminal of door and21_9 and with door and21_10, the output of d type flip flop dff_2
Q is held to connect the first input end with the second input terminal of door and21_2 and with door and21_3 simultaneously, d type flip flop dff_3's is defeated
Outlet q connects the first input end with the second input terminal of door and21_4 and with door and21_5 simultaneously, d type flip flop dff_4's
Output end q connects the first input end with door and21_6 simultaneously, and the output end q of d type flip flop dff_5 is connected simultaneously and door
The first input end of and21_7, the output end q of d type flip flop dff_6 connect the first input end with door and21_8, D touching simultaneously
The output end q for sending out device dff_7 is connected simultaneously and the first input end of door and21_9, the output end q of d type flip flop dff_8 connect simultaneously
Connect the first input end with door and21_10;
With the output end output signal nmcycle [1] of door and21_0, output end output signal with door and21_1
Mcycle [1], the output end output signal nmcycle [2] with door and21_2, the output end output signal with door and21_3
Mcycle [2], the output end output signal nmcycle [3] with door and21_4, the output end output signal with door and21_5
Mcycle [3], the output end output signal mcycle [4] with door and21_6, the output end output signal with door and21_7
Mcycle [5], the output end output signal mcycle [6] with door and21_8, the output end output signal with door and21_9
mcycle[7];
With the output end of door and21_6 with or the first input end of door or21_1 connect, the output end with door and21_10
With or door or21_1 the second input terminal connect or the output end of door or21_1 on the one hand output operation complete signal done, separately
On the one hand it is connect with the first input end of two input nor gate nor21_0, the input terminal connection of reverser inv_1 is externally input
Reset signal rst, the output end of reverser inv_1 with two input nor gate nor21_0 the second input terminal connect, two input or
The output end of NOT gate nor21_0 connect simultaneously d type flip flop dff_0, dff_1, dff_2, dff_3, dff_4, dff_5, dff_6,
The RESET input rst of dff_7 and dff_8;With the output end of door and21_5 with or the first input end of door or21_0 connect
Connect, with the output end of door and21_9 with or the second input terminal of door or21_0 connect or the output end output signal of door or21_0
result_clk。
The symbol decision module includes data opposite number complement arithmetic device DD1,16 data latches and d type flip flop
dff_9;
The input terminal d2 of the input terminal DIN of data opposite number complement arithmetic device DD1 and 16 data latches simultaneously with it is defeated
The operational data connection entered, the output end DOUT of data opposite number complement arithmetic device DD1 and the input terminal of 16 data latches
D1 connection;
The 11st data [11] for inputting operational data connects the control signal ctrl and D of 16 data latches simultaneously
The data input pin d of trigger dff_9;
Cycle control signal cycle0 connects the input end of clock clk and d type flip flop dff_9 of 16 data latches simultaneously
Input end of clock clk;Externally input reset signal rst connects the RESET input rst and D of 16 data latches simultaneously
The RESET input rst of trigger dff_9;
The absolute value data_abs [15:0] of the output end d3 output input operational data of 16 data latches, D triggering
The symbol neg of the output end q output input operational data of device dff_9.
The opposite arithmetic unit includes data opposite number complement arithmetic device DD2 and d type flip flop dff_21,
Data opposite number complement arithmetic device DD2 input terminal DIN connection floating-point adder output as a result, output end DOUT
Meet the d of the data input pin of d type flip flop dff_21;
The input end of clock clk connection mcycle [5] of d type flip flop dff_21, the RESET input of d type flip flop dff_21
Rst is connect with external input reset signal rst, the phase of the output end q output floating-point adder output result of d type flip flop dff_21
Anti- number.
Data opposite number complement arithmetic device DD1 is identical with data opposite number complement arithmetic device DD2 implementation process, specifically such as
Under:
Step 1, the note received data of input terminal DIN are DIN [15:0], calculate the opposite number complement code of DIN [15:0] mantissa
D_MID [13:0], while step 2, step 3 and step 4 are carried out, D_MID [13:0] is judged;
Step 2 judges whether D_MID [13:0] is 0, if so, into step 5;
Step 3, judges whether D_MID [13:0] overflows, if D_MID [13] xorD_MID [12]=1 is spilling, if
It overflows, then enters step 6;
Step 4 enters step 7 if D_MID [13] xorD_MID [12]=0;
Step 5, setting D_MID [17:14] is maximum negative value, to obtain 0 Correct method, while carrying out step 8, the
9 steps and step 10 judge D_MID [17:14];
Step 6, D_MID [13:0] move to right one, and index D _ MID [17:14]=DIN [15:12] plus 1, carry out simultaneously
Step 8, step 9 and step 10 judge D_MID [17:14];
Step 7, D_MID [13:0] move to left K, and index D _ MID [17:14]=DIN [15:12] subtracts K, while carrying out the
8 steps, step 9 and step 10 judge D_MID [17:14];
Step 8, whether index of discrimination operation result D_MID [17:14] is positive to overflow, if it is, into step 11;
Step 9, whether negative sense overflows index of discrimination operation result D_MID [17:14], if it is, into step 12;
Step 10, whether index of discrimination operation result D_MID [17:14] is in range, if it is, into step 13;
Step 11, if D_MID [13:0] > 0, setting D_MID is floating number positive peak;If D_MID [13:0] < 0, sets
D_MID is floating number negative peak, into step 13;
Step 12, it is -8 that setting D_MID [13:0], which is 0, D_MID [17:14], into step 13;
18 bit arithmetic data D_MID are removed sign-extension bit by step 13, are removed implicit position and are exported to get to DOUT sections
16 floating-point format data DOUT [15:0], wherein DOUT [15:12]=D_MID [17:14], DOUT [11]=D_MID
[12], DOUT [10:0]=D_MID [10:0].
Range detection module includes floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, anti-
Phase device inv_3, phase inverter inv_4, phase inverter inv_5, two input or door or21_2, three input or door or31_0, three input or
Door or31_1, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_1,
Second input terminal connects constant C0, and C0 is (5.0000)10;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_2,
Second input terminal connects constant C1, and C1 is (1.0000)10;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_3,
Second input terminal connects constant C2, and C2 is (1.0000)10;
The output end of the floating-point comparator fcom_1 data input pin d and phase inverter with d type flip flop dff_11 simultaneously
The input terminal of inv_3 connects, the output end of phase inverter inv_3 simultaneously with the first input end of two inputs or door or21_2, three defeated
Enter or the first input end of door or31_0 and three input or door or31_1 first input end connection, floating-point comparator fcom_2
Output end connect simultaneously with the input terminal of phase inverter inv_4 and two inputs or the second input terminal of door or21_2;Two inputs
Or the output end of door or21_2 is connect with the data input pin d of d type flip flop dff_12;The output end of phase inverter inv_4 simultaneously with
The second input terminal connection of the second input terminal and three inputs or door or31_1 of three inputs or door or31_0;Floating-point comparator
The output end of fcom_3 is connect with three inputs or the third input terminal of door or31_0 and the input terminal of phase inverter inv_5 simultaneously,
The output end of phase inverter inv_5 is connect with the third input terminal of three inputs or door or31_1, three inputs or the output of door or31_0
End is connect with the data input pin d of d type flip flop dff_13, the output end and d type flip flop dff_14 of three inputs or door or31_1
Data input pin d connection;
The RESET input of d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14
Rst is connect with externally input reset signal rst, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and D touching
The end input end of clock clk for sending out device dff_14 connects mcycle [1], d type flip flop dff_14 output end q output signal range
[0], d type flip flop dff_13 output end q output signal range [1], d type flip flop dff_12 output end q output signal range
[2], d type flip flop dff_11 output end q output signal range [3].
The address generator include four 4 bit address generation module addrgen1, addrgen2, addrgen3 and
addrgen4;
The end cin of addrgen1 is grounded, the end cin of the end the cout connection addrgen2 of addrgen1, addrgen2's
The end cout connects the end cin of addrgen3, the end cin of the end the cout connection addrgen4 of addrgen3;addrgen1,
Addrgen2, addrgen3 connect nmcycle [1] with the end clk of addrgen4;Addrgen1, addrgen2, addrgen3 and
The end rst of addrgen4 connects externally input reset signal rst, addrgen1, addrgen2, addrgen3 and addrgen4
D [15:0] end bound symbol judgment module output data;The d3 [15:0] of addrgen1 terminates constant (8)10It is corresponding floating
Points, d2 [15:0] termination constant (7.5)10Corresponding floating number, d1 [15:0] termination constant (7)10Corresponding floating number, d0
[15:0] terminates constant (6.5)10Corresponding floating number, output end out [3:0] the output address addr [16:13] of addrgen1;
The d3 [15:0] of Addrgen2 terminates constant (6)10Corresponding floating number, d2 [15:0] termination constant (5.5)10It is right
Floating number, d1 [15:0] termination constant (5) answered10Corresponding floating number, d0 [15:0] termination constant (4.5)10Corresponding floating-point
Number, output end out [3:0] the output address addr [12:9] of addrgen1;
The d3 [15:0] of Addrgen3 terminates constant (4)10Corresponding floating number, d2 [15:0] termination constant (3.5)10It is right
Floating number, d1 [15:0] termination constant (3) answered10Corresponding floating number, d0 [15:0] termination constant (2.5)10Corresponding floating-point
Number, output end out [3:0] the output address addr [8:5] of addrgen1;
The d3 [15:0] of Addrgen4 terminates constant (2)10Corresponding floating number, d2 [15:0] termination constant (1.5)10It is right
Floating number, d1 [15:0] termination constant (1) answered10Corresponding floating number, d0 [15:0] termination constant (0.5)10Corresponding floating-point
The end cout of number, output end out [3:0] the output address addr [4:1] of addrgen1, addrgen4 exports addr [0].
4 bit address generation module addrgen1, addrgen2, addrgen3 and addrgen4 structures are identical, include floating
Point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7, phase inverter inv_
6, phase inverter inv_7, phase inverter inv_8, phase inverter inv_9, two inputs or door or21_3, three inputs or door or31_2, four defeated
Enter or door or41_0, five input or door or51_0, five input or door or51_1, d type flip flop dff_16, d type flip flop dff_17, D
Trigger dff_18 and d type flip flop dff_19;
Floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 are equal
Have two input terminals;
The first input end of floating-point comparator fcom_4 is that the d [15:0] of 4 bit address generation modules is held, and the second input terminal is made
It is held for the d3 [15:0] of 4 bit address generation modules;
The first input end of floating-point comparator fcom_5 is connect with the first input end of floating-point comparator fcom_4, and second is defeated
The d2 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_6 is connect with the first input end of floating-point comparator fcom_4, and second is defeated
The d1 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_7 is connect with the first input end of floating-point comparator fcom_4, and second is defeated
The d0 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of two inputs or door or21_3, the first input end of three inputs or door or31_2, four inputs or door
The first input end of or41_0, the first input end of five inputs or door or51_0 and five inputs or the first input of door or51_1
Hold the end cin after being connected as 4 bit address generation modules;
The output end of floating-point comparator fcom_4 inputs simultaneously with the input terminal of phase inverter inv_6 and two or door or21_3
The connection of the second input terminal, two inputs or the output end of door or21_3 and the input terminal d of d type flip flop dff_16 are connect;
The output end of floating-point comparator fcom_5 inputs simultaneously with the input terminal of phase inverter inv_7 and three or door or31_2
The connection of third input terminal, the output end of phase inverter inv_6 connect with three inputs or the second input terminal of door or31_2, and three input
Or the output end of door or31_2 is connect with the input terminal d of d type flip flop dff_17;
The output end of floating-point comparator fcom_6 inputs simultaneously with the input terminal of phase inverter inv_8 and four or door or41_0
The connection of the 4th input terminal, the output end of phase inverter inv_6 is inputted with four or the second input terminal of door or41_0 is connect, phase inverter
The output end of inv_7 is connect with the third input terminal of four inputs or door or41_0, and the output end and D of four inputs or door or41_0 touch
Send out the input terminal d connection of device dff_18;
The output end of floating-point comparator fcom_7 inputs simultaneously with the input terminal of phase inverter inv_9 and five or door or51_0
The connection of the 5th input terminal, the output end of phase inverter inv_6 is inputted with five or the second input terminal of door or51_0 is connect, phase inverter
The output end of inv_7 connect with five inputs or the third input terminal of door or51_0, the output end of phase inverter inv_8 and five input or
The input terminal d of the 4th input terminal connection of door or51_0, the output end and d type flip flop dff_19 of five inputs or door or51_0 connects
It connects;
The output end of phase inverter inv_6 is connect with the second input terminal of five inputs or door or51_1, and phase inverter inv_7's is defeated
Outlet connect with five inputs or the third input terminal of door or51_1, and the output end of phase inverter inv_8 and five inputs or door or51_1
The connection of the 4th input terminal, the output end of phase inverter inv_9 connect with five inputs or the 5th input terminal of door or51_1, and five input
Or cout end of the output end of door or51_1 as 4 bit address generation modules;
The RESET input of d type flip flop dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19
The end rst as 4 bit address generation modules after rst is connected, input end of clock clk as 4 bit address generate mould after being connected
The end clk of block, d type flip flop dff_16 export the 3rd bit address, and d type flip flop dff_17 exports the 2nd bit address, d type flip flop dff_18
The 1st bit address is exported, d type flip flop dff_19 exports the 0th bit address.
Floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, floating-point comparator fcom_4,
Floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 workflow are identical, each floating-point comparator
Workflow is specific as follows:
Step 1 compares first input end input data and the second input terminal constant:
It according to floating point data format, adds mantissa and implies position, while extending a bit sign position;
As a result the mantissa that the second input terminal constant is subtracted with the mantissa of first input end input data is denoted as dm [13:0];
Step 2 compares the index of first input end input data and the second input terminal constant:
A bit sign position is extended, the index of the second input terminal constant is subtracted with the index of first input end input data, is tied
Fruit is denoted as de [4:0];
Step 3 carries out symbol decision to mantissa comparison result dm [13:0]:
Sdm:sdm=dm [13] xor dm [12] is calculated using following formula;
When sdm is 0, dm [13:0] is positive number;When sdm is 1, dm [13:0] is negative;
Step 4 carries out symbol decision to index comparison result de:
Sde:sde=de [4] xor dm [3] is calculated using following formula;
When sde is 0, de [4:0] is positive number;When sde is 1, de [4:0] is negative;
Step 5, comparison result judgement, if it is not 0 that sde, which be 0 and de [4:0], first input end input data is greater than the
Two input terminal constants, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 0, first input end input data is greater than second
Input terminal constant, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 1, first input end input data is defeated less than second
Enter to hold constant, exporting is 1;If sde is 1, for first input end input data less than the second input terminal constant, exporting is 1.
The parameter register includes four 32 tri-state control doors and 32 d type flip flop dff_15, j-th 32
The jth position of the range intervals id signal of the control signal end join domain detecting module output of tri-state control door, j=0,1,2,
3;The reset signal end of four 32 tri-state control doors is all connected with externally input reset signal rst, four 32 tri-state controls
Data input pin d, the RESET input rst of 32 d type flip flop dff_15 of the output end of door with 32 d type flip flop dff_15
Connect externally input reset signal rst, the input end of clock outer mcycle of clk connection [2] of 32 d type flip flop dff_15,32
High 16 of the output end q output of position d type flip flop dff_15 are Monomial coefficient A [15:0], low 16 for offset B [15:
0]。
The look-up table includes 33 16 tri-state control doors, 33 two inputs or door, 16 phase inverters and 1 16 D
Trigger dff_20;
Second input terminal of preceding 17 two inputs or door is connect with the symbol of symbol decision module output, and first 17 two defeated
Enter or door in, kth1The kth bit address value that the first input end link address generator of a two input or door generates, k1=0,1,
2 ... ... 16;
16 two inputs or door are corresponded with 16 phase inverters afterwards, latter 16 two input or the second input terminal of door with
Corresponding inverter output connection, the input terminal of 16 phase inverters are connect with the symbol of symbol decision module output, and rear 16
In a two input or door, kth2The kth that the first input end link address generator of a two input or door generates2- 16 bit address
Value, k2=17,18,19 ... ... 32;
The output end of each two input or door is connected with the second input terminal of corresponding 16 tri-state control doors, and 33 16
The first input end of position tri-state control door and the RESET input rst of 16 d type flip flop dff_20 are resetted with external input
Signal rst connection, the output end of 33 16 tri-state control doors are defeated with the data of 16 d type flip flop dff_20 after linking together
Enter d is held to connect, the input end of clock clk and nmcycle [2] of 16 d type flip flop dff_20 is connect, 16 d type flip flop dff_20
The corresponding tanh activation primitive value lut_data [15:0] of output end q output input operational data.
First gated latches include the identical gating latch units of 16 structures;
P-th of gating latch units includes selector mux_1, reverser inv_2 and d type flip flop dff_22, selector
The pth position of the A input terminal connection floating-point adder calculated result of mux_1, the B input terminal of selector mux_1 connect phase inverse operation
The pth position of device calculated result, the output end of selector mux_1 are connect with the d input terminal of d type flip flop dff_22, reverser inv_2
Input terminal and selector mux_1 the end AS simultaneously with symbol decision module output data symbol connect, reverser inv_2
Output end connect with the end BS of selector mux_1;The input end of clock clk and mcycle [6] of d type flip flop dff_22 is connect, D
The RESET input rst of trigger dff_22 is connect with externally input reset signal, the output end output of d type flip flop dff_22
The pth position output data of first gated latches, p=0,1,2,3 ... ... 15.
Second gated latches include the identical gating latch units of 16 structures;Q-th of gating latch units packet
Include selector mux_2, reverser inv_2 ' and d type flip flop dff_23, the A input terminal connection and locating table output of selector mux_2
As a result pth position, the B input terminal of selector mux_2 connect the pth position of the first gated latches output result, selector mux_2
Output end connect with the d input terminal of d type flip flop dff_23, the input terminal of reverser inv_2 ' and the AS of selector mux_2
End is connect with configuration control signal M simultaneously, and the output end of reverser inv_2 ' is connect with the end BS of selector mux_2;D type flip flop
The input end of clock clk of dff_23 and the cycle control signal of controller connect, the RESET input rst of d type flip flop dff_23
It is connect with externally input reset signal, the pth position that the output end of d type flip flop dff_23 exports the second gated latches exports number
According to.
The invention has the following advantages that
(1) present invention uses computation of table lookup tanh functional value and linear function the Fitting Calculation sigmoid functional value, two kinds of letters
Number calculation features simple structure;The present invention can share the operand of sigmoid function Yu tanh function according to timing control simultaneously
The area of realization is reduced compared to independently realizing according to the partial function module in road;Sigmoid function and tanh letter
Functional module in several operational data roads only when corresponding cycle control signal is effective, just there is dynamic power consumption, Qi Tashi
It waits without dynamic power consumption, therefore integrated circuit is low in energy consumption;In addition, the present invention using synchronised clock design, convenient for timing inspection with test
Card, enhances the practicability of Embedded Application.
(2) using the present invention calculate neural network activation primitive when, only need to one scheme control choosing then, clock and
Reset signal, so that it may which the generation of control function value, control flow are simple;If only calculating Certain function summary using apparatus of the present invention
Value, then the device, which is constituted, calculates assembly line, and the efficiency of neural network activation primitive calculating can be improved.
(3) present invention can be according to tanh computational accuracy needs, easily extended address generator block and look-up table means
To meet the needs of function precision transformation.
Detailed description of the invention
Fig. 1 is a kind of 16 floating point data formats;
Fig. 2 is composed structure of the invention;
Fig. 3 is controller architecture;
Fig. 4 is symbol decision modular structure;
Fig. 5 is data opposite number complement arithmetic flow chart;
Fig. 6 is the structure that a gating latches, wherein (a) is one bit architecture schematic diagram of the first gated latches, it is (b) the
Two gated latches, one bit architecture schematic diagram;
Fig. 7 is the structure of range detection module;
Fig. 8 is floating-point comparator flow chart;
Fig. 9 is the parameter register structure of 4 32 word lengths;
Figure 10 is floating-point multiplier operational flowchart;
Figure 11 is floating-point adder operational flowchart;
Figure 12 is address generator structure;
Figure 13 is 4 bit address generation module structures;
Figure 14 is to look for table structure;
Figure 15 is the structure of opposite arithmetic unit.
Specific embodiment
For a clearer understanding of the present invention, below in conjunction with attached drawing, the present invention is described in further detail.
The characteristics of according to Neural Network Data operation, handled data are determined using following data format in the present invention
Justice: data length is 16 bits, and data are floating number;[15:12] 4 is the index of floating number in 16 bits
Area, the 11st is sign bit;One implicit position is lain between [11:10];[10:0] is the fractional part of floating number;16
Floating number each section is indicated with the complement of two's two's complement.As shown in Figure 1.It implicit position between in place 11 and position 10 will be in operation
One binary system point is added to arithmetic unit position to complete to operate, when the is-not symbol position of this highest order will show,
Position is in close proximity to the left side of binary point, and floating-point binary complement code x is provided by following form in this floating-point format:
X=01.f × 2eIf s=0
X=10.f × 2eIf s=0
X=0 if e=-8
In this short floating-point format 0 must be indicated using retention given below:
E=-8
S=0
F=0
The range and precision of the floating-point format:
Maximum positive: x=(2-2-11)×27=2.5594 × 102
Minimum positive number: x=1 × 2-7=7.8125 × 10-3
Minimum negative: x=(- 1-2-11)×2-7=-7.8163 × 10-3
Maximum negative: x=-2 × 2-7=-2.5600 × 102
On the basis of the above, the present invention proposes the realization device of a configurable neural network activation primitive, such as Fig. 2 institute
Show, is mainly made of a controller and sigmoid function and the operational data path of tanh function.Entirely configurable nerve
The realization device of network activation function has 4 input signals: external input clock signal clk, external input reset signal rst,
Configuration control signal M, activation primitive independent variable data value DATA [15:0] (operational data inputted, this is floating for 16
Point data), floating-point format is as indicated earlier;2 output signals: (this is 16 to the operation result result [15:0] of activation primitive
Position floating data), activation primitive operation complete status signal DONE.
When carrying out the operation of sigmoid activation primitive, configuration control signal M is configured to 1;Configuration control signal M and outside
Input clock signal clk, external input reset signal rst are input to controller, generate and complete sigmoid activation primitive operand
According to control signal cycle0, mcycle [6:1] and result_clk of 8 clock cycle in path;Sigmoid activation primitive
Operational data path is by symbol decision module, range detection module, parameter register, floating-point multiplier, floating-point adder, opposite
Number arithmetic unit, the first gated latches, the second gated latches composition.Sigmoid activation primitive calculating process is as follows:
The data value DATA [15:0] of sigmoid activation primitive independent variable initially enters symbol decision module, judges in the module defeated
Enter the positive and negative situation of the value of operational data, if input data value is that canonical exports the data, otherwise exports the opposite of the data
Number, the i.e. absolute value of the data, result are data_abs [15:0] and export the symbol neg, neg of DATA to be 1 expression DATA
For negative, neg is that 0 expression DATA is positive number, and data_abs [15:0] and neg signal are by sigmoid activation primitive
Clock cycle control signal cycle0 be responsible for latch;The output data_abs [15:0] of symbol decision module is input to range spy
Survey module, the value range of data_abs [15:0] is divided into 4 numerical intervals, range detection be exactly determine data_abs [15:
0] in which positioned numerical intervals, the section according to locating for data value, then determine to estimate using the linear function relatively approached
The value of sigmoid activation primitive, it is determined that the register of linear function coefficient and offset is also formed while value interval
Address rang [3:0], cycle control signal mcycle [1] are responsible for the register address rang of linear function coefficient and offset
The latch of [3:0] output;Rang [3:0] is output to parameter register, for selecting to approach a letter of sigmoid activation primitive
Several parameters, i.e. Monomial coefficient and offset, parameter register bit wide 32, wherein high 16 for Monomial coefficient A [15:
0], low 16 are offset B [15:0], and cycle control signal mcycle [2] is responsible for the output of parameter register, i.e. A [15:0]
Output with B [15:0] is latched;A [15:0] and data_abs [15:0] is input to floating-point multiplier, carries out floating-point multiplication calculating,
Cycle control signal mcycle [3] is responsible for the latch of the output AX [15:0] of floating-point multiplier;The output AX of floating-point multiplier
The output B [15:0] of [15:0] and parameter register is input to floating-point adder, carries out floating add calculating, cycle control signal
Mcycle [4] is responsible for the latch of the output Y [15:0] of floating-point adder;The output Y [15:0] of floating-point adder is input on the contrary
Arithmetic unit, the value NY [15:0] of the opposite number for calculating Y [15:0], cycle control signal mcycle [5] are responsible for phase inverse operation
The latch of the output NY [15:0] of device;The output NY [15:0] of opposite arithmetic unit and the output Y [15:0] of floating-point adder are inputted
To the first gated latches, output is gated according to the data symbol neg of data symbol judgment module output, when neg=0 gating is defeated
The operation result Y [15:0] of floating-point adder out, when the result NY [15:0] of neg=1 gating output phase inverse operation device, period control
Signal mcycle [6] processed is responsible for the latch of the output sigmoid_result [15:0] of the first gated latches;First logical latch
The output sigmoid_result [15:0] of device is input to the second gated latches, according to the value of configuration control signal M, gates defeated
Out and the operation result of sigmoid activation primitive or the operation result of tanh activation primitive are latched, the second gating as M=1
Latch gating output sigmoid_result [15:0], cycle control signal result_clk are responsible for the second gated latches
Export the output of result [15:0], the i.e. output of the operation result of sigmoid activation primitive;Controller exports DONE, and (height has
Effect), the operation of characterization sigmoid activation primitive is completed.When carrying out the operation of tanh activation primitive, configuration control signal M configuration
It is 0, configuration control signal M and external input clock signal clk and reset signal rst are input to controller, generate completion
Control signal cycle0, nmcycle [2:1] and result_ of 4 clock cycle in tanh activation primitive operational data path
clk;The activation primitive operational data path tanh is by symbol decision module, address generating module, look-up table means, gated latches
2 compositions.Tanh activation primitive calculating process is as follows: the data value DATA of tanh activation primitive independent variable initially enters symbol decision
Module, the positive and negative situation of the value of judgement input operational data in the module, if input data value is that canonical exports the data,
Otherwise the value i.e. absolute value of the data of the opposite number of the data is exported, result is data_abs [15:0] and exports DATA's
Symbol neg, neg are that 1 expression DATA is negative, and neg is that 0 expression DATA is positive number, and data_abs [15:0] and neg believes
Number latch is responsible for by the clock cycle control signal cycle0 of tanh activation primitive;The output data_abs of symbol decision module
[15:0] is input to address generating module, and the value of data_abs [15:0] is divided into 2n(n is according to tanh function essence for a numerical intervals
Depending on degree requires, 4) this specification n takes, and address, which generates, needs which numerical intervals at data_abs [15:0] quickly determined
In, the section according to locating for data value forms and searches table address addr [17:0], and cycle control signal nmcycle [1] is responsible for looking into
The latch for looking for table address addr [17:0] to export;Addr [17:0] is output to look-up table, stores in a lookup table for selecting
The functional value of tanh activation primitive, cycle control signal nmcycle [2] are responsible for the output lock of the functional value of ginseng tanh activation primitive
It deposits;The functional value tanh_result [15:0] of tanh activation primitive is input to the second gated latches, the second gated latches root
It exports and latches according to the value gating of configuration control signal M, the second gated latches gating output tanh activation primitive as M=0
Functional value, cycle control signal result_clk be responsible for the second gated latches output, result [15:0] i.e. tanh swash
The output of the operation result of function living.Controller exports DONE (Gao Youxiao), and the operation of characterization tanh activation primitive is completed.With reference to
Symbol decision module and the second gated latches are two sharp in the operational data path of Fig. 1, sigmoid function and tanh function
The module shared in function data path living.
The input signal of controller (refer to Fig. 2, Fig. 3) is configuration control signal M and external input clock signal clk, outer
Portion input reset signal rst, output signal be cycle control signal cycle0, mcycle [6:1], nmcycle [3:1],
Signal DONE is completed in result_clk and operation.Its function are as follows: different activation primitives are generated according to the value of configuration control signal m
Operational data path cycle control signal and activation primitive operation complete beacon signal DONE;As M=0, controller
Generate 4 latch control signals in tanh activation primitive operational data path;As M=1, controller generates sigmoid activation
8 latch control signals of functional operation data path.
For controller mainly by 9 d type flip flop structure compositions, every d type flip flop structure is identical, the structure of controller such as Fig. 3
Shown, controller includes d type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop
Dff_4, d type flip flop dff_5, d type flip flop dff_6, d type flip flop dff_7, d type flip flop dff_8 and door and21_0 and door
And21_1 and door and21_2 and door and21_3 and door and21_4 and door and21_5 and door and21_6 and door
And21_7, nor gates are inputted with door and21_8, with door and21_9, with door and21_10 or door or21_0, or21_1, two
Nor21_0, reverser inv_0 and reverser inv_1.D type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, D triggering
Device dff_3, d type flip flop dff_4, d type flip flop dff_5, d type flip flop dff_6, d type flip flop dff_7 and d type flip flop dff_8 when
The data input pin d connection " 1 " of clock input terminal clk connection external input clock signal clk, d type flip flop dff_0 and high potential, D
On the one hand output cycle control signal cycle0, another aspect connect the number of trigger dff_1 to the output end q of trigger dff_0
According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_2 of d type flip flop dff_1;D type flip flop dff_2's is defeated
The data input pin d of outlet q connection d type flip flop dff_3;The number of the output end q connection d type flip flop dff_4 of d type flip flop dff_3
According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_5 of d type flip flop dff_4;D type flip flop dff_5's is defeated
The data input pin d of outlet q connection d type flip flop dff_6;The number of the output end q connection d type flip flop dff_7 of d type flip flop dff_6
According to input terminal d;The output end q connection d type flip flop dff_8 data input pin d of d type flip flop dff_7;
The input terminal of reverser inv_0 connects configuration control signal M;The output end of reverser inv_0 connects simultaneously and door
The first input end of and21_0, the first input end with the first input end of door and21_2 and with door and21_4, d type flip flop
The output end q of dff_1 connects the first input end with the second input terminal of door and21_0 and with door and21_1, configuration control simultaneously
Signal M processed is connected simultaneously with the second input terminal of the second input terminal of door and21_1 and door and21_3, with door and21_5's
Second input terminal, second with the second input terminal of the second input terminal of door and21_6 and door and21_7 and door and21_8
Input terminal, the second input terminal with the second input terminal of door and21_9 and with door and21_10, the output of d type flip flop dff_2
Q is held to connect the first input end with the second input terminal of door and21_2 and with door and21_3 simultaneously, d type flip flop dff_3's is defeated
Outlet q connects the first input end with the second input terminal of door and21_4 and with door and21_5 simultaneously, d type flip flop dff_4's
Output end q connects the first input end with door and21_6 simultaneously, and the output end q of d type flip flop dff_5 is connected simultaneously and door
The first input end of and21_7, the output end q of d type flip flop dff_6 connect the first input end with door and21_8, D touching simultaneously
The output end q for sending out device dff_7 is connected simultaneously and the first input end of door and21_9, the output end q of d type flip flop dff_8 connect simultaneously
Connect the first input end with door and21_10;
With the output end output signal nmcycle [1] of door and21_0, output end output signal with door and21_1
Mcycle [1], the output end output signal nmcycle [2] with door and21_2, the output end output signal with door and21_3
Mcycle [2], the output end output signal nmcycle [3] with door and21_4, the output end output signal with door and21_5
Mcycle [3], the output end output signal mcycle [4] with door and21_6, the output end output signal with door and21_7
Mcycle [5], the output end output signal mcycle [6] with door and21_8, the output end output signal with door and21_9
mcycle[7];
With the output end of door and21_6 with or the first input end of door or21_1 connect, the output end with door and21_10
With or door or21_1 the second input terminal connect or the output end of door or21_1 on the one hand output operation complete signal done, separately
On the one hand it is connect with the first input end of two input nor gate nor21_0, the input terminal connection of reverser inv_1 is externally input
Reset signal rst, the output end of reverser inv_1 with two input nor gate nor21_0 the second input terminal connect, two input or
The output end of NOT gate nor21_0 connect simultaneously d type flip flop dff_0, dff_1, dff_2, dff_3, dff_4, dff_5, dff_6,
The RESET input rst of dff_7 and dff_8;With the output end of door and21_5 with or the first input end of door or21_0 connect
Connect, with the output end of door and21_9 with or the second input terminal of door or21_0 connect or the output end output signal of door or21_0
result_clk。
The input signal of symbol decision module is input data DATA [15:0], cycle control signal cycle0, resets letter
Number rst;Output signal is the positive and negative identifier neg of data, data absolute value data_abs [15:0].Its function are as follows: work as input
When the 11st [11]=0 DATA of sign bit of data, input data is positive, and the data are latched when cycle0 is effective and are exported;When
When [11]=1 DATA, input data is negative, and the complemented value of the opposite number of the data and output are latched when cycle0 is effective;Input
The sign bit DATA [11] of data, latches and is exported when cycle0 is effective as neg;Reset signal rst can will be exported when effective
Signal data_abs [15:0] and neg are set to 0.
The structure of symbol decision module is as shown in figure 4,16 data of input connect data opposite number complement arithmetic device DD1's
The D2 input terminal of input terminal DIN and 16 data latches;The output end DOUT of data opposite number complement arithmetic device D1 connects 16
The D1 input terminal of data latches;Input 16 data the 11st DATA [11] connect latch control signal ctrl and
The data input pin d of d type flip flop dff_9;Input cycle control signal cycle0 connects input end of clock clk and the D touching of latch
Send out the input end of clock clk of device dff_9;External input reset signal rst connects the RESET input of 16 data latches
The RESET input rst of rst and d type flip flop dff_9.
The operational flowchart (referring to Fig. 5) of data opposite number complement arithmetic device is to realize 16 floating data DIN
The complement code of [15:0] opposite number needs 13 steps.
Step 1 seeks the opposite number complement code of 16 floating number DIN [15:0] mantissa.According to the rule of floating point arithmetic, 16
Floating number DIN [15:0] will be expanded into 18 operational data D [17:0], and 4 index D IN [15:12] bit wides of DIN are not
Become, corresponding D [17:14], 12 mantissa DIN [11:0] of DIN are extended to 14 D [13:0], wherein D [10:0]=DIN [10:
0], D [11]=~DIN [11] is the implicit position being added in floating-point format, and D [12]=DIN [11] is the sign bit of mantissa, D
[13]=DIN [11] is mantissa's symbol Bits Expanding one.After the completion of Data expansion, D [13:0] step-by-step is negated, and adds in lowest order
1, operation result is D_MID [13:0].Step 2, step 3 and step 4 are carried out simultaneously, D_MID [13:0] is judged;
Step 2 is to judge whether the mantissa operation result D_MID [13:0] of operational data is 0, if so, into the 5th
Step.
Step 3 is to judge whether mantissa operation result D_MID [13:0] of operational data overflows, if D_MID [13]
XorD_MID [12]=1 is to overflow, if overflowing, enters step 6.
Step 4, D_MID [13:0] is non-spill, i.e. when D_MID [13] [12]=0 xorD_MID, then enters step 7;
Step 5, when mantissa operation result D_MID [13:0] is 0, setting D_MID [17:14] is maximum negative value, to obtain 0
Correct representation, while step 8, step 9 and step 10 are carried out, D_MID [17:14] is judged.
Step 6, when mantissa operation result D_MID [13:0] overflows, mantissa number D_MID [13:0] moves to right one and index
DIN [15:12] plus 1, while step 8, step 9 and step 10 are carried out, D_MID [17:14] is judged.
Step 7, is processing mode when mantissa operation result D_MID [13:0] is non-spill, and mantissa number D_MID [13:0] is left
It moves K and index D IN [15:12] and subtracts K, while carrying out step 8, step 9 and step 10, D_MID [17:14] is sentenced
It is disconnected.
Step 8, is whether index of discrimination operation result D_MID [17:14] is positive spilling, if it is, into the 11st
Step.
Step 9, is whether index of discrimination operation result D_MID [17:14] is that negative sense overflows, if it is, into the 12nd
Step.
Whether step 10 is index of discrimination operation result D_MID [17:14] in data range, if it is, into
13 steps.
Step 11, if D_MID [13:0] > 0, setting D_MID is floating number positive peak;If D_MID [13:0] < 0, sets
D_MID is floating number negative peak, into step 13.
Step 12, it is -8 that setting D_MID [13:0], which is 0, D_MID [17:14], into step 13.
18 bit arithmetic data D_MID are converted to 16 floating-point format data, remove sign-extension bit, remove by step 13
Implicit position, that is, export 16 floating-point format data DOUT [15:0], wherein DOUT [15:12]=D_MID [17:14], DOUT
[11]=D_MID [12], DOUT [10:0]=D_MID [10:0].
First gated latches and the second gated latches are 16 parallel-by-bit structures.
First gated latches include the identical gating latch units of 16 structures.
P-th of gating latch units includes selector mux_1, reverser inv_2 and d type flip flop dff_22, selector
The pth position of the A input terminal connection floating-point adder calculated result of mux_1, the B input terminal of selector mux_1 connect phase inverse operation
The pth position of device calculated result, the output end of selector mux_1 are connect with the d input terminal of d type flip flop dff_22, reverser inv_2
Input terminal and selector mux_1 the end AS simultaneously with symbol decision module output data symbol connect, reverser inv_2
Output end connect with the end BS of selector mux_1;The input end of clock clk and mcycle [6] of d type flip flop dff_22 is connect, D
The RESET input rst of trigger dff_22 is connect with externally input reset signal, the output end output of d type flip flop dff_22
The pth position output data of first gated latches, p=0,1,2,3 ... ... 15.In Fig. 6 shown in (a).
Its function are as follows: as the data symbol neg=0 of symbol decision module output, select D1 [p], locked when clk is effective
It deposits gated data and exports D3 [p];It as neg=1, selects D2 [p], gated data is latched when clk is effective and exports D3
[p]。
Second gated latches include the identical gating latch units of 16 structures.
Q-th of gating latch units includes selector mux_2, reverser inv_2 ' and d type flip flop dff_23, selector
The first gating of B input terminal connection of the pth position of the A input terminal connection and locating table output result of mux_2, selector mux_2 latches
Device exports the pth position of result, and the output end of selector mux_2 is connect with the d input terminal of d type flip flop dff_23, reverser inv_
The end AS of 2 ' input terminal and selector mux_2 connects with configuration control signal M simultaneously, the output end of reverser inv_2 ' and
The end BS of selector mux_2 connects;The input end of clock clk of d type flip flop dff_23 and the cycle control signal of controller connect,
The RESET input rst of d type flip flop dff_23 is connect with externally input reset signal, and the output end of d type flip flop dff_23 is defeated
The pth position output data of second gated latches out.In Fig. 6 shown in (b).
The input signal of range detection module are as follows: timing control signal mcycle [1], reset signal rst, 16 floating numbers
According to data_abs;Output signal is 4 range signal range [3:0].Its function are as follows: reset signal is effective, sets constant C0 and is
(5.0000)10, constant C1 be (1.0000)10, constant C2 be (1.0000)10;Input data by floating-point comparator simultaneously with
C0, C1, C2 are compared;When the floating data data_abs of input is greater than C0, output range [3] is that 0, range [2] are
1, range [1] is that 1, range [0] is 1;When the floating data data_abs of input is greater than C1, output range [2] is 0,
It is 1, range [0] is 1 that range [3], which is 1, range [1],;When the floating data data_abs of input is greater than C2, output
It is 1, range [2] be 1, range [0] is 1 that range [1], which is 0, range [3],;If floating data data_abs be less than C0,
When C1, C2, range [3] output be 1, range [2] output be 1, range [1] export be 1, range [0] export be 0.Range
Detecting module structure such as Fig. 7: including floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, reverse phase
Device inv_3, phase inverter inv_4, phase inverter inv_5, two inputs or door or21_2, three inputs or door or31_0, three inputs or door
Or31_1, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_1,
Second input terminal connects constant C0, and C0 is (5.0000)10;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_2,
Second input terminal connects constant C1, and C1 is (1.0000)10;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_3,
Second input terminal connects constant C2, and C2 is (1.0000)10;
The output end of the floating-point comparator fcom_1 data input pin d and phase inverter with d type flip flop dff_11 simultaneously
The input terminal of inv_3 connects, the output end of phase inverter inv_3 simultaneously with the first input end of two inputs or door or21_2, three defeated
Enter or the first input end of door or31_0 and three input or door or31_1 first input end connection, floating-point comparator fcom_2
Output end connect simultaneously with the input terminal of phase inverter inv_4 and two inputs or the second input terminal of door or21_2;Two inputs
Or the output end of door or21_2 is connect with the data input pin d of d type flip flop dff_12;The output end of phase inverter inv_4 simultaneously with
The second input terminal connection of the second input terminal and three inputs or door or31_1 of three inputs or door or31_0;Floating-point comparator
The output end of fcom_3 is connect with three inputs or the third input terminal of door or31_0 and the input terminal of phase inverter inv_5 simultaneously,
The output end of phase inverter inv_5 is connect with the third input terminal of three inputs or door or31_1, three inputs or the output of door or31_0
End is connect with the data input pin d of d type flip flop dff_13, the output end and d type flip flop dff_14 of three inputs or door or31_1
Data input pin d connection;
The RESET input of d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14
Rst is connect with externally input reset signal rst, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and D touching
The end input end of clock clk for sending out device dff_14 connects mcycle [1], d type flip flop dff_14 output end q output signal range
[0], d type flip flop dff_13 output end q output signal range [1], d type flip flop dff_12 output end q output signal range
[2], d type flip flop dff_11 output end q output signal range [3].
Floating-point comparator flow chart (referring to Fig. 8) is to realize a 16 digit data_abs [15:0] and 1 carry digit Ci
The floating-point comparator of [2:0] needs 5 steps.
Step 1 compares the mantissa of first input end input data data_abs and the second input terminal constant Ci, i, wherein i
=0,1,2:
It according to floating point data format, adds mantissa and implies position, while extending a bit sign position;
The mantissa of Ci is subtracted with the mantissa of data_abs, is as a result denoted as dm [13:0];
Step 2 compares the index of first input end input data data_abs and the second input terminal constant Ci:
A bit sign position is extended, the index of constant Ci is subtracted with the index of data_abs, is as a result denoted as de [4:0];
Step 3 carries out symbol decision to mantissa comparison result dm [13:0]:
Sdm:sdm=dm [13] xor dm [12] is calculated using following formula;
When sdm is 0, dm [13:0] is positive number;When sdm is 1, dm [13:0] is negative;
Step 4 carries out symbol decision to index comparison result de:
Sde:sde=de [4] xor dm [3] is calculated using following formula;
When sde is 0, de [4:0] is positive number;When sde is 1, de [4:0] is negative;
Step 5, comparison result judgement, if it is not 0 that sde, which is 0 and de [4:0], first input end input data data_
Abs is greater than the second input terminal constant Ci, and exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 0, first input end input
Data data_abs is greater than the second input terminal constant Ci, and exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 1, first is defeated
Enter to hold input data data_abs less than the second input terminal constant Ci, exporting is 1;If sde is 1, first input end inputs number
According to data_abs less than the second input terminal constant Ci, exporting is 1.
The input signal of 32 parameter registers (referring to Fig. 2, Fig. 9) of 4 word lengths is range intervals id signal range
[3:0], clock signal clk, reset signal rst, output signal are to approach the primary phase coefficient of the linear function of sigmoid function
A [15:0] and offset B [15:0].Its function are as follows: when rst is effective, parameter register set, 32 ginsengs of 4 word lengths
Each word of number register is set to 4 groups of letters corresponding to 4 demarcation intervals of the independent variable of sigmoid function input respectively
Several Monomial coefficients and offset is followed successively by (A, B) according to the corresponding sequence of range intervals id signal range [3:0]10:
(0.0000,0.0000)10;(0.0313,0.8438)10;(0.1250,0.6250)10;(0.2500,0.5000)10.According to range
Section id signal range selects corresponding 32 words, for approach sigmoid activation primitive linear function parameter,
That is Monomial coefficient A [15:0] and offset B [15:0], and believed according to clock and latch output.Parameter register includes four 32
Position tri-state control door and 32 d type flip flop dff_15, the control signal end join domain of j-th of 32 tri-state control doors are visited
The jth position of the range intervals id signal of survey module output, j=0,1,2,3;The reset signal end of four 32 tri-state control doors
It is all connected with externally input reset signal rst, the output end of four 32 tri-state control doors is with 32 d type flip flop dff_15's
Data input pin d, the externally input reset signal rst of the RESET input rst connection of 32 d type flip flop dff_15,32 D touchings
The input end of clock outer mcycle of clk connection [2] of device dff_15 is sent out, the output end q of 32 d type flip flop dff_15 exports high by 16
Position is Monomial coefficient A [15:0], and low 16 are offset B [15:0].
The operational flowchart (referring to Figure 10) of floating-point multiplier is to realize two 16 floating data A [15:0] and data_
Abs [15:0] multiplying needs 5 steps.
Step 1, operational data prepare, by 16 floating-point format data of A [15:0] be converted to 18 bit arithmetic data AIN [17:
0], conversion process is as follows:
AIN [17:14]=A [15:12], AIN [13]=A [11], AIN [12]=A [11], AIN [11]=~A [11],
AIN [10:0]=A [10:0];
16 floating-point format data of data_abs [15:0] are converted into 18 bit arithmetic data XIN [17:0] simultaneously, if
Range [3]=1, then the format of XIN [17:0] is as follows: XIN [17:14]=data_abs [15:12], XIN [13]=
Data_abs [11], XIN [12]=data_abs [11], XIN [11]=~data_abs [11], XIN [10:0]=data_
abs[10:0]};If range [3]=0, XIN [17:0] is set to constant 1.
Step 2, data multiplication operation, AIN with XIN mantissa are multiplied, and obtain AXIN mantissa: AXIN [27:0]=AIN [13:0]
×XIN[13:0];
AIN XIN index is added, and obtains AXIN index: AXIN [31:28]=AIN [17:14]+XIN [17:14].
Step 3, multiplication mantissa operation result AXIN [27:0] determines, if AXIN [27:14]=0, AXIN [31:28]
=-8;If AXIN [27:14] > > 1 be normalized number, { AXIN [27:14] } > > 1, AXIN [31:28]=AXIN [31:28]
+1};If AXIN [27:14] > > 2 be normalized number,
{ AXIN [27:14] > > 2, AXIN [31:28]=AXIN [31:28]+2 }, AXIN [27:14] are normalized number.
Step 4, multiplication exponent arithmetic result AXIN [31:28] determine, if AXIN [31:28] overflow and AXIN [27:
14] > 0, then { AXIN [31:28]=7 };If AXIN [31:28] overflow and
AXIN [27:14] < 0, then { AXIN [31:28]=- 8 };If AXIN [31:28] underflow,
{ AXIN [31:28]=- 8, AXIN [27:14]=0 }, AXIN [31:28] is in value range.
Step 5, multiplication result latch output, if rst=1 and clk rise effectively,
{ AX [15:12]=AXIN [31:28], AX [11]=AXIN [27], AX [10:0]=AXIN [25:15] }, output
As a result: AX [15:0]=A [15:0] × data_abs [15:0].
The operational flowchart (referring to Figure 11) of floating-point adder be realize two 16 floating data AX [15:0] and B [15:
0] add operation needs 7 steps.
Step 1, operational data prepare, and 16 floating-point format data of AX [15:0] are converted to 18 bit arithmetic data AXIN
[17:0], conversion regime is as follows: AXIN [17:14]=AX [15:12], AXIN [13]=AX [11], AXIN [12]=AX [11],
AXIN [11]=~AX [11], AXIN [10:0]=AX [10:0].
16 floating-point format data of B [15:0] are converted into 18 bit arithmetic data BIN [17:0], conversion regime is as follows: BIN
[17:14]=B [15:12], BIN [13]=B [11], BIN [12]=B [11], BIN [11]=~B [11], BIN [10:0]=B
[10:0]。
Step 2 calculates the index difference of two floating-point operation data, it may be assumed that N=AXIN [17:14]-BIN [17:14].
Two floating-point operation data are carried out mantissa to rank according to index difference by step 3, it may be assumed that if N > 0, BIN [13:
0]=BIN [13:0] > > N YIN [17:14]=AXIN [17:14] };Otherwise AXIN [13:0]=AXIN [13:0] > > | N | YIN
[17:14]=BIN [17:14] }.
Two floating-point operation data mantissa are carried out add operation by step 4, it may be assumed that YIN=AXIN [13:0]+BIN [13:
0]。
Step 5, mantissa adder operation result YIN [13:0] judgement, if YIN [13:0]=0, YIN [17:14]=-
8;If YIN [13:0] overflows, { YIN [13:0]=YIN [13:0] > > 1, YIN [17:14]=YIN [17:14]+1 };If
YIN [13:0] < < K becomes normalized number, then { YIN [13:0]=YIN [13:0] < < K, YIN [17:14]=YIN [17:14]-
K}。
6th step, add operation result exponent YIN [17:14] determines, if YIN [17:14] overflow and YIN [13:0] > 0
Then { YIN [17:14]=7 };If YIN [17:14] overflow and YIN [13:0] < 0, { YIN [17:14]=- 8 };If YIN
[17:14] underflow, then { YIN [17:14]=- 8, YIN [13:0]=0 }, YIN [17:14] is in value range.
7th step, add operation result latch output, if rst=1 and clk rise effectively, { Y [15:12]=YIN
[15:14], and Y [11]=YIN [13 }, Y [10:0]=YIN [10:0] }, export Y [15:0]=AX [15:0]+B [15:0].
The input signal of address generator (referring to Figure 12, Figure 13) is clock nmycle [1], reset signal rst, 16 digits
According to data_abs [15:0];Output signal is 17 bit address addr [16:0].Its function are as follows: for generating the address of look-up table,
The section of the value according to locating for data generates 17 bit address values;16 input data data_abs [15:0] simultaneously with 4 bit address
Generation module addrgen1, addrgen2, addrgen3, addrgen4 are compared;As 6.5≤data_abs of input data
When [15:0]≤8, the output addr [3:0] of addrgen1 wherein one be 0, it is 1 that other addresses, which generate the output of mould address,;When defeated
When entering 4.5≤data_abs of data [15:0]≤6, the output adr [3:0] of addrgen2 wherein one be 0, other addresses generate
The output of mould address is 1;As 2.5≤data_abs of input data [15:0]≤4, the output addr [3:0] of addrgen3 is wherein
One is 0, and it is 1 that other addresses, which generate the output of mould address,;As 0.5≤data_abs of input data [15:0]≤2, addrgen4
Output adr [3:0] wherein one be 0, it is 1 that other addresses, which generate the output of mould address,.The structure of address generator such as Figure 12 institute
Show: address generator includes four 4 bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen4;
The end cin of addrgen1 is grounded, the end cin of the end the cout connection addrgen2 of addrgen1, addrgen2's
The end cout connects the end cin of addrgen3, the end cin of the end the cout connection addrgen4 of addrgen3;addrgen1,
Addrgen2, addrgen3 connect nmcycle [1] with the end clk of addrgen4;Addrgen1, addrgen2, addrgen3 and
The end rst of addrgen4 connects externally input reset signal rst, addrgen1, addrgen2, addrgen3 and addrgen4
D [15:0] end bound symbol judgment module output data;The d3 [15:0] of addrgen1 terminates constant (8)10It is corresponding floating
Points, d2 [15:0] termination constant (7.5)10Corresponding floating number, d1 [15:0] termination constant (7)10Corresponding floating number, d0
[15:0] terminates constant (6.5)10Corresponding floating number, output end out [3:0] the output address addr [16:13] of addrgen1;
The d3 [15:0] of Addrgen2 terminates constant (6)10Corresponding floating number, d2 [15:0] termination constant (5.5)10It is right
Floating number, d1 [15:0] termination constant (5) answered10Corresponding floating number, d0 [15:0] termination constant (4.5)10Corresponding floating-point
Number, output end out [3:0] the output address addr [12:9] of addrgen1;
The d3 [15:0] of Addrgen3 terminates constant (4)10Corresponding floating number, d2 [15:0] termination constant (3.5)10It is right
Floating number, d1 [15:0] termination constant (3) answered10Corresponding floating number, d0 [15:0] termination constant (2.5)10Corresponding floating-point
Number, output end out [3:0] the output address addr [8:5] of addrgen1;
The d3 [15:0] of Addrgen4 terminates constant (2)10Corresponding floating number, d2 [15:0] termination constant (1.5)10It is right
Floating number, d1 [15:0] termination constant (1) answered10Corresponding floating number, d0 [15:0] termination constant (0.5)10Corresponding floating-point
The end cout of number, output end out [3:0] the output address addr [4:1] of addrgen1, addrgen4 exports addr [0].
The input signal of 4 bit address generation modules (referring to Figure 13) is clock signal nmcycle [1], 16 data data_
Abs [15:0], reset signal rst, 16 data d0,16 data d1,16 data d2,16 data d3, carry digit CIN;
Output signal is 4 bit address addr [3:0], carry-out cout.Its function are as follows: 16 input data data_abs [15:0] are same
Division numerical constant d0, d1, d2, d3 (and d0 < d1 < d2 < d3) of Shi Yusi tanh function argument value interval are compared;
As input data data_abs [15:0] >=d3, addr [3] is 0, and the output of other addresses is 1, and cascaded-output control cout is
1;As input data d2≤data_abs [15:0] < d3, addr [2] is 0, other output address are 1, cascaded-output control
Cout is 1;As input data d1≤data_abs [15:0] < d2, addr [1] is 0, and the output of other addresses is 1, cascaded-output
Controlling cout is 1;As input data d0≤data_abs [15:0] < d1, addr [0] is 0, and the output of other addresses is 1, cascade
Output control cout is 1;When cascade input control signal CIN is effective, address addr [3:0] is 1, and cascaded-output controls cout
It is 1.4 bit address generation module structures are as shown in figure 13: 4 bit address generation module addrgen1, addrgen2, addrgen3 and
Addrgen4 structure is identical, includes floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floats
Point comparator fcom_7, phase inverter inv_6, phase inverter inv_7, phase inverter inv_8, phase inverter inv_9, two inputs or door
Or21_3, three inputs or door or31_2, four inputs or door or41_0, five inputs or door or51_0, five inputs or door or51_1, D
Trigger dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19;
Floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 are equal
Have two input terminals;
The first input end of floating-point comparator fcom_4 is held as the d [15:0] of 4 bit address generation modules, the second input terminal
D3 [15:0] as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_5 is connect with the first input end of floating-point comparator fcom_4, and second is defeated
The d2 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_6 is connect with the first input end of floating-point comparator fcom_4, and second is defeated
The d1 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_7 is connect with the first input end of floating-point comparator fcom_4, and second is defeated
The d0 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of two inputs or door or21_3, the first input end of three inputs or door or31_2, four inputs or door
The first input end of or41_0, the first input end of five inputs or door or51_0 and five inputs or the first input of door or51_1
Hold the end cin after being connected as 4 bit address generation modules;
The output end of floating-point comparator fcom_4 inputs simultaneously with the input terminal of phase inverter inv_6 and two or door or21_3
The connection of the second input terminal, two inputs or the output end of door or21_3 and the input terminal d of d type flip flop dff_16 are connect;
The output end of floating-point comparator fcom_5 inputs simultaneously with the input terminal of phase inverter inv_7 and three or door or31_2
The connection of third input terminal, the output end of phase inverter inv_6 connect with three inputs or the second input terminal of door or31_2, and three input
Or the output end of door or31_2 is connect with the input terminal d of d type flip flop dff_17;
The output end of floating-point comparator fcom_6 inputs simultaneously with the input terminal of phase inverter inv_8 and four or door or41_0
The connection of the 4th input terminal, the output end of phase inverter inv_6 is inputted with four or the second input terminal of door or41_0 is connect, phase inverter
The output end of inv_7 is connect with the third input terminal of four inputs or door or41_0, and the output end and D of four inputs or door or41_0 touch
Send out the input terminal d connection of device dff_18;
The output end of floating-point comparator fcom_7 inputs simultaneously with the input terminal of phase inverter inv_9 and five or door or51_0
The connection of the 5th input terminal, the output end of phase inverter inv_6 is inputted with five or the second input terminal of door or51_0 is connect, phase inverter
The output end of inv_7 connect with five inputs or the third input terminal of door or51_0, the output end of phase inverter inv_8 and five input or
The input terminal d of the 4th input terminal connection of door or51_0, the output end and d type flip flop dff_19 of five inputs or door or51_0 connects
It connects;
The output end of phase inverter inv_6 is connect with the second input terminal of five inputs or door or51_1, and phase inverter inv_7's is defeated
Outlet connect with five inputs or the third input terminal of door or51_1, and the output end of phase inverter inv_8 and five inputs or door or51_1
The connection of the 4th input terminal, the output end of phase inverter inv_9 connect with five inputs or the 5th input terminal of door or51_1, and five input
Or cout end of the output end of door or51_1 as 4 bit address generation modules;
The RESET input of d type flip flop dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19
The end rst as 4 bit address generation modules after rst is connected, input end of clock clk as 4 bit address generate mould after being connected
The end clk of block, d type flip flop dff_16 export the 3rd bit address, and d type flip flop dff_17 exports the 2nd bit address, d type flip flop dff_18
The 1st bit address is exported, d type flip flop dff_19 exports the 0th bit address.
The input signal of look-up table (referring to Figure 14) is reset signal rst, address date addr [16:0], nmcycle
[2], symbol neg;Output signal is lut_data [15:0].Its function are as follows: when reset signal rst is effective, look-up table sets number,
The value for searching list item is set to functional value corresponding to tanh function argument section;Neg and addr are collectively formed in look-up table
The index address of list item;When index address is effective, the storage value lut_data [15:0] of corresponding look-up table list item is exported.It searches
Table structure is as shown in figure 14: look-up table is mainly made of 33 lookup list items, and adhesive logic is by 33 two inputs two or door and one
A 16 d type flip flops composition;33 lookup list item mem_i (i=32 ... 0) from address high to address bottom, successively set by position storage value
Are as follows:
(1.00000)10, (1.00000)10, (0.99999)10, (0.99999)10, (0.99998)10, (0.99996)10,
(0.99909)10, (0.99975)10, (0.99932)10, (0.99817)10,(0.99505)10, (0.98661)10,
(0.96402)10,(0.90514)10, (0.76159)10,(0.46211)10,(0)10, (- 0.50000)10, (- 0.10000)10, (-
0.15000)10, (- 0.20000)10, (- 0.25000)10, (- 0.30000)10, (- 0.35000)10, (- 0.40000)10, (-
0.45000)10, (- 0.50000)10, (- 0.55000)10, (- 0.60000)10, (- 0.65000)10, (- 0.70000)10, (-
0.75000)10, (- 0.80000)10;The output end of 33 two inputs two or door meets 33 lookup list item mem_i (i=respectively
32 ... 0) tri-state control end, neg and addr [15:0] connect two input terminals of 17 two inputs or door respectively;Neg and!addr
[15:0] connects two inputs termination of 16 two or door respectively.33 storage unit outputs are connected together, but can only export every time
One value, the data terminal d, nmcycle [2] that output valve meets trigger dff_20 meet the clock end clk of trigger dff_20, trigger
The output end q of device dff_20 is lut_data [15:0].
The input signal of opposite arithmetic unit (referring to Fig. 2, Fig. 5, Figure 15) be floating-point adder output data Y [15:0],
Cycle control signal mcycle [5], reset signal rst;Output signal is NY [15:0].Its function are as follows: calculate input data
The complement code of opposite number.The structure of opposite arithmetic unit is as shown in figure 14, and opposite arithmetic unit includes data opposite number complement arithmetic device DD2
With d type flip flop dff_21,
Data opposite number complement arithmetic device DD2 input terminal DIN connection floating-point adder output as a result, output end DOUT
Meet the d of the data input pin of d type flip flop dff_21;The input end of clock clk connection mcycle [5] of d type flip flop dff_21, D touching
The RESET input rst of hair device dff_21 is connect with external input reset signal rst, the output end q output of d type flip flop dff_21
The opposite number of floating-point adder output result.Wherein the operation process of data opposite number complement arithmetic device DD2 refers to Fig. 5.
The present invention can realize the most widely used two kinds of activation primitive sigmoid of field of neural networks by configuring control word
The operation of function and tanh function realizes that structure is simple, designs using synchronised clock that, convenient for timing inspection and verifying, area is small,
It is low in energy consumption, convenient for realizing on chip, enhance the practicability of Embedded Application;Neural network, which is calculated, using the present invention activates letter
When number, process flow is simple, easily controllable, improves the efficiency of neural network activation primitive calculating;The configurable nerve net
The realization device that network swashs function can be according to tanh computational accuracy needs, easily extended address generator block and look-up table means
To meet the needs of function precision transformation.Therefore it is the ideal structure that embedded neural network processor activation primitive is realized.
Unspecified part of the present invention belongs to common sense well known to those skilled in the art.
Claims (14)
1. a kind of configurable neural network activation primitive realization device, which is characterized in that including controller, symbol decision mould
Block, range detection module, parameter register, floating-point multiplier, floating-point adder, opposite number arithmetic unit, address generator, lookup
Table, the first gated latches and the second gated latches;
Controller: the latch needed in the different entire operational data paths of activation primitive is generated according to the value of configuration control signal M
It controls signal and signal done is completed in operation;
Symbol decision module: receiving the operational data of input, judges the positive and negative of the data, if it is canonical by the data export to
Range detection module, address generator and floating-point multiplier;Otherwise by the absolute value of the data export to range detection module,
Location generator and floating-point multiplier, while the symbol of the data being exported to address generator and the first gated latches;
Range detection module: judging which section received data value is in, and identifies and believes to parameter register range of transmission section
Number;
Parameter register: storage is used to approach the parameter of the linear function of sigmoid activation primitive, i.e. Monomial coefficient and offset
Amount;Approach the linear function of sigmoid activation primitive according to the selection of range intervals id signal, to floating-point multiplier export this one
The Monomial coefficient of secondary function exports the offset of the linear function to floating-point adder;
Floating-point multiplier: the Monomial coefficient for the linear function that extraction scope detecting module exports from parameter register module,
And the Monomial coefficient of the linear function and the product of data value are calculated, it exports to floating-point adder;
Floating-point adder: the offset for the linear function that extraction scope detecting module exports from parameter register module calculates
The sum of product and offset from floating-point multiplier, and obtained result is exported and is latched to opposite arithmetic unit and the first gating
Device;
Opposite arithmetic unit: the opposite number of floating-point adder output result is calculated, is exported to the first gated latches;
First gated latches: when the symbol of input operational data is timing, by the calculated result gating from floating-point adder
It exports to the second gated latches, when the symbol for inputting operational data is negative, will be selected from the calculated result of opposite arithmetic unit
Logical output is to the second gated latches;
Address generator: section locating for the data value exported according to symbol decision module generates 17 bit address values, as lookup
Table index;
Look-up table: storing tanh activation primitive value corresponding to each data interval, according to the look-up table rope in address generator
Draw the corresponding tanh activation primitive value of lookup input operational data, exports to the second gated latches;
Second gated latches: according to the value of configuration control signal M, gating exports and latches the operation of sigmoid activation primitive
Or the operation result of tanh activation primitive as a result.
2. a kind of configurable neural network activation primitive realization device according to claim 1, which is characterized in that M=1
Indicate the operation of sigmoid activation primitive, controller generates 8 needed in the entire operational data path of sigmoid activation primitive
Latch control signal mcycle [7:1];M=0 indicates the operation of tanh activation primitive, and controller generates tanh activation primitive and entirely transports
Calculate the 4 latch control signal nmcycle [3:1] needed in data path.
3. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described
Controller includes d type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop dff_4, D
Trigger dff_5, d type flip flop dff_6, d type flip flop dff_7, d type flip flop dff_8, with door and21_0, with door and21_1, with
Door and21_2, with door and21_3, with door and21_4, with door and21_5, with door and21_6, with door and21_7 and door
And21_8, nor gate nor21_0, reverser are inputted with door and21_9, with door and21_10 or door or21_0, or21_1, two
Inv_0 and reverser inv_1;
D type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop dff_4, d type flip flop
Dff_5, d type flip flop dff_6, d type flip flop dff_7 connect external input clock with the input end of clock clk of d type flip flop dff_8
The data input pin d connection " 1 " of signal clk, d type flip flop dff_0 and high potential, the output end q of d type flip flop dff_0 is on the one hand
Cycle control signal cycle0 is exported, the data input pin d of trigger dff_1 is on the other hand connected;D type flip flop dff_1's is defeated
The data input pin d of outlet q connection d type flip flop dff_2;The number of the output end q connection d type flip flop dff_3 of d type flip flop dff_2
According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_4 of d type flip flop dff_3;D type flip flop dff_4's is defeated
The data input pin d of outlet q connection d type flip flop dff_5;The number of the output end q connection d type flip flop dff_6 of d type flip flop dff_5
According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_7 of d type flip flop dff_6;D type flip flop dff_7's is defeated
Outlet q connection d type flip flop dff_8 data input pin d;
The input terminal of reverser inv_0 connects configuration control signal M;The output end of reverser inv_0 connects simultaneously and door
The first input end of and21_0, the first input end with the first input end of door and21_2 and with door and21_4, d type flip flop
The output end q of dff_1 connects the first input end with the second input terminal of door and21_0 and with door and21_1, configuration control simultaneously
Signal M processed is connected simultaneously with the second input terminal of the second input terminal of door and21_1 and door and21_3, with door and21_5's
Second input terminal, second with the second input terminal of the second input terminal of door and21_6 and door and21_7 and door and21_8
Input terminal, the second input terminal with the second input terminal of door and21_9 and with door and21_10, the output of d type flip flop dff_2
Q is held to connect the first input end with the second input terminal of door and21_2 and with door and21_3 simultaneously, d type flip flop dff_3's is defeated
Outlet q connects the first input end with the second input terminal of door and21_4 and with door and21_5 simultaneously, d type flip flop dff_4's
Output end q connects the first input end with door and21_6 simultaneously, and the output end q of d type flip flop dff_5 is connected simultaneously and door
The first input end of and21_7, the output end q of d type flip flop dff_6 connect the first input end with door and21_8, D touching simultaneously
The output end q for sending out device dff_7 is connected simultaneously and the first input end of door and21_9, the output end q of d type flip flop dff_8 connect simultaneously
Connect the first input end with door and21_10;
With the output end output signal nmcycle [1] of door and21_0, output end output signal mcycle with door and21_1
[1], with the output end output signal nmcycle [2] of door and21_2, output end output signal mcycle with door and21_3
[2], with the output end output signal nmcycle [3] of door and21_4, output end output signal mcycle with door and21_5
[3], with the output end output signal mcycle [4] of door and21_6, output end output signal mcycle with door and21_7
[5], with the output end output signal mcycle [6] of door and21_8, output end output signal mcycle with door and21_9
[7];
With the output end of door and21_6 with or the first input end of door or21_1 connect, with the output end of door and21_10 with or
On the one hand the second input terminal connection of door or21_1 or the output end of door or21_1 export operation and complete signal done, another party
Face is connect with the first input end of two input nor gate nor21_0, and the input terminal of reverser inv_1 connects externally input reset
Signal rst, the output end of reverser inv_1 are connect with the second input terminal of two input nor gate nor21_0, two input nor gates
The output end of nor21_0 connects d type flip flop dff_0, dff_1, dff_2, dff_3, dff_4, dff_5, dff_6, dff_7 simultaneously
And the RESET input rst of dff_8;With the output end of door and21_5 with or the first input end of door or21_0 connect, with door
The output end of and21_9 with or the second input terminal of door or21_0 connect or the output end output signal result_ of door or21_0
clk。
4. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described
Symbol decision module includes data opposite number complement arithmetic device DD1,16 data latches and d type flip flop dff_9;
The input terminal d2 of the input terminal DIN of data opposite number complement arithmetic device DD1 and 16 data latches simultaneously with input
The input terminal d1 of operational data connection, the output end DOUT of data opposite number complement arithmetic device DD1 and 16 data latches connects
It connects;
The 11st data [11] for inputting operational data connects control signal ctrl and the D triggering of 16 data latches simultaneously
The data input pin d of device dff_9;
Cycle control signal cycle0 simultaneously connect 16 data latches input end of clock clk and d type flip flop dff_9 when
Clock input terminal clk;Externally input reset signal rst connects the RESET input rst and the D triggering of 16 data latches simultaneously
The RESET input rst of device dff_9;
The absolute value data_abs [15:0] of the output end d3 output input operational data of 16 data latches, d type flip flop
The symbol neg of the output end q output input operational data of dff_9.
5. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described
Opposite arithmetic unit includes data opposite number complement arithmetic device DD2 and d type flip flop dff_21,
Data opposite number complement arithmetic device DD2 input terminal DIN connection floating-point adder output as a result, output end DOUT meets D
The d of the data input pin of trigger dff_21;
The input end of clock clk connection mcycle [5] of d type flip flop dff_21, the RESET input rst of d type flip flop dff_21 with
The rst connection of external input reset signal, the opposite number of the output end q output floating-point adder output result of d type flip flop dff_21.
6. a kind of configurable neural network activation primitive realization device according to claim 4 or 5, which is characterized in that
Data opposite number complement arithmetic device DD1 is identical with data opposite number complement arithmetic device DD2 implementation process, specific as follows:
Step 1, the note received data of input terminal DIN are DIN [15:0], calculate the opposite number complement code D_MID of DIN [15:0] mantissa
[13:0], while step 2, step 3 and step 4 are carried out, D_MID [13:0] is judged;
Step 2 judges whether D_MID [13:0] is 0, if so, into step 5;
Step 3, judges whether D_MID [13:0] overflows, if D_MID [13] xorD_MID [12]=1 is to overflow, if overflowing
Out, then enter step 6;
Step 4 enters step 7 if D_MID [13] xorD_MID [12]=0;
Step 5, setting D_MID [17:14] is maximum negative value, to obtain 0 Correct method, while carrying out step 8, step 9
And step 10, D_MID [17:14] is judged;
Step 6, D_MID [13:0] moves to right one, and index D _ MID [17:14]=DIN [15:12] plus 1, while carrying out the 8th
Step, step 9 and step 10, judge D_MID [17:14];
Step 7, D_MID [13:0] moves to left K, and index D _ MID [17:14]=DIN [15:12] subtracts K, at the same carry out step 8,
Step 9 and step 10 judge D_MID [17:14];
Step 8, whether index of discrimination operation result D_MID [17:14] is positive to overflow, if it is, into step 11;
Step 9, whether negative sense overflows index of discrimination operation result D_MID [17:14], if it is, into step 12;
Step 10, whether index of discrimination operation result D_MID [17:14] is in range, if it is, into step 13;
Step 11, if D_MID [13:0] > 0, setting D_MID is floating number positive peak;If D_MID [13:0] < 0, sets D_
MID is floating number negative peak, into step 13;
Step 12, it is -8 that setting D_MID [13:0], which is 0, D_MID [17:14], into step 13;
18 bit arithmetic data D_MID are removed sign-extension bit by step 13, remove implicit position to get 16 exported to DOUT sections
Position floating-point format data DOUT [15:0], wherein DOUT [15:12]=D_MID [17:14], DOUT [11]=D_MID [12],
DOUT [10:0]=D_MID [10:0].
7. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that range
Detecting module includes floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, phase inverter inv_3, anti-
Phase device inv_4, phase inverter inv_5, two inputs or door or21_2, three inputs or door or31_0, three inputs or door or31_1, D touching
Send out device dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_1, second
Input terminal connects constant C0, and C0 is (5.0000)10;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_2, second
Input terminal connects constant C1, and C1 is (1.0000)10;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_3, second
Input terminal connects constant C2, and C2 is (1.0000)10;
The output end of floating-point comparator fcom_1 simultaneously with the data input pin d and phase inverter inv_3 of d type flip flop dff_11
Input terminal connection, the output end of phase inverter inv_3 inputs simultaneously with the first input end of two inputs or door or21_2, three or door
The first input end of or31_0 and three inputs or the first input end connection of door or31_1, the output of floating-point comparator fcom_2
End is connect with the second input terminal of the input terminal of phase inverter inv_4 and two inputs or door or21_2 simultaneously;Two inputs or door
The output end of or21_2 is connect with the data input pin d of d type flip flop dff_12;The output end of phase inverter inv_4 is defeated with three simultaneously
Enter or the second input terminal of door or31_0 and three input or door or31_1 the second input terminal connection;Floating-point comparator fcom_3
Output end connect simultaneously with three inputs or the third input terminal of door or31_0 and the input terminal of phase inverter inv_5, phase inverter
The output end of inv_5 is connect with the third input terminal of three inputs or door or31_1, and the output end and D of three inputs or door or31_0 touch
The data input pin d connection of device dff_13 is sent out, the data of the output end and d type flip flop dff_14 of three inputs or door or31_1 input
Hold d connection;
D type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14 the RESET input rst with
Externally input reset signal rst connection, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop
The end input end of clock clk of dff_14 connects mcycle [1], d type flip flop dff_14 output end q output signal range [0], D touching
It sends out device dff_13 output end q output signal range [1], d type flip flop dff_12 output end q output signal range [2], D triggering
Device dff_11 output end q output signal range [3].
8. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described
Address generator includes four 4 bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen4;
The end cin of addrgen1 is grounded, the end cin of the end the cout connection addrgen2 of addrgen1, the end cout of addrgen2
Connect the end cin of addrgen3, the end cin of the end the cout connection addrgen4 of addrgen3;addrgen1,addrgen2,
Addrgen3 connects nmcycle [1] with the end clk of addrgen4;Addrgen1, addrgen2, addrgen3 and addrgen4
The end rst connect externally input reset signal rst, the d of addrgen1, addrgen2, addrgen3 and addrgen4 [15:
0] data of end bound symbol judgment module output;The d3 [15:0] of addrgen1 terminates constant (8)10Corresponding floating number, d2
[15:0] terminates constant (7.5)10Corresponding floating number, d1 [15:0] termination constant (7)10Corresponding floating number, the end d0 [15:0]
Connect constant (6.5)10Corresponding floating number, output end out [3:0] the output address addr [16:13] of addrgen1;
The d3 [15:0] of Addrgen2 terminates constant (6)10Corresponding floating number, d2 [15:0] termination constant (5.5)10It is corresponding floating
Points, d1 [15:0] termination constant (5)10Corresponding floating number, d0 [15:0] termination constant (4.5)10Corresponding floating number,
Output end out [3:0] the output address addr [12:9] of addrgen1;
The d3 [15:0] of Addrgen3 terminates constant (4)10Corresponding floating number, d2 [15:0] termination constant (3.5)10It is corresponding floating
Points, d1 [15:0] termination constant (3)10Corresponding floating number, d0 [15:0] termination constant (2.5)10Corresponding floating number,
Output end out [3:0] the output address addr [8:5] of addrgen1;
The d3 [15:0] of Addrgen4 terminates constant (2)10Corresponding floating number, d2 [15:0] termination constant (1.5)10It is corresponding floating
Points, d1 [15:0] termination constant (1)10Corresponding floating number, d0 [15:0] termination constant (0.5)10Corresponding floating number,
The end cout of output end out [3:0] the output address addr [4:1] of addrgen1, addrgen4 export addr [0].
9. a kind of configurable neural network activation primitive realization device according to claim 8, which is characterized in that 4
Address generating module addrgen1, addrgen2, addrgen3 are identical with addrgen4 structure, include floating-point comparator
Fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7, phase inverter inv_6, phase inverter
Inv_7, phase inverter inv_8, phase inverter inv_9, two inputs or door or21_3, three inputs or door or31_2, four inputs or door
Or41_0, five inputs or door or51_0, five inputs or door or51_1, d type flip flop dff_16, d type flip flop dff_17, d type flip flop
Dff_18 and d type flip flop dff_19;
Floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 are provided with
Two input terminals;
The first input end of floating-point comparator fcom_4 is that the d [15:0] of 4 bit address generation modules is held, and the second input terminal is as 4
The d3 [15:0] of bit address generation module is held;
The first input end of floating-point comparator fcom_5 is connect with the first input end of floating-point comparator fcom_4, the second input terminal
D2 [15:0] as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_6 is connect with the first input end of floating-point comparator fcom_4, the second input terminal
D1 [15:0] as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_7 is connect with the first input end of floating-point comparator fcom_4, the second input terminal
D0 [15:0] as 4 bit address generation modules is held;
The first input end of two inputs or door or21_3, the first input end of three inputs or door or31_2, four inputs or door or41_
The first input end phase of 0 first input end, the first input end of five inputs or door or51_0 and five inputs or door or51_1
The end cin after connection as 4 bit address generation modules;
The output end of floating-point comparator fcom_4 simultaneously with the input terminal of phase inverter inv_6 and two inputs or door or21_3 the
The connection of two input terminals, two inputs or the output end of door or21_3 and the input terminal d of d type flip flop dff_16 are connect;
The output end of floating-point comparator fcom_5 simultaneously with the input terminal of phase inverter inv_7 and three inputs or door or31_2 the
The connection of three input terminals, the output end of phase inverter inv_6 are connect with the second input terminal of three inputs or door or31_2, three inputs or door
The output end of or31_2 is connect with the input terminal d of d type flip flop dff_17;
The output end of floating-point comparator fcom_6 simultaneously with the input terminal of phase inverter inv_8 and four inputs or door or41_0 the
The connection of four input terminals, the output end of phase inverter inv_6 are connect with the second input terminal of four inputs or door or41_0, phase inverter inv_
7 output end is connect with the third input terminal of four inputs or door or41_0, the output end and d type flip flop of four inputs or door or41_0
The input terminal d connection of dff_18;
The output end of floating-point comparator fcom_7 simultaneously with the input terminal of phase inverter inv_9 and five inputs or door or51_0 the
The connection of five input terminals, the output end of phase inverter inv_6 are connect with the second input terminal of five inputs or door or51_0, phase inverter inv_
7 output end connect with five inputs or the third input terminal of door or51_0, and the output end of phase inverter inv_8 and five inputs or door
The 4th input terminal of or51_0 connects, and five inputs or the output end of door or51_0 and the input terminal d of d type flip flop dff_19 are connect;
The output end of phase inverter inv_6 is connect with the second input terminal of five inputs or door or51_1, the output end of phase inverter inv_7
Connect with five inputs or the third input terminal of door or51_1, the output end of phase inverter inv_8 and five input or door or51_1 the
The connection of four input terminals, the output end of phase inverter inv_9 are connect with the 5th input terminal of five inputs or door or51_1, five inputs or door
Cout end of the output end of or51_1 as 4 bit address generation modules;
The RESET input rst phase of d type flip flop dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19
The end rst after connection as 4 bit address generation modules, input end of clock clk be connected after as 4 bit address generation modules
The end clk, d type flip flop dff_16 export the 3rd bit address, and d type flip flop dff_17 exports the 2nd bit address, d type flip flop dff_18 output
1st bit address, d type flip flop dff_19 export the 0th bit address.
10. a kind of configurable neural network activation primitive realization device according to claim 7 or 9, which is characterized in that
Floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, floating-point comparator fcom_4, floating-point compare
Device fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 workflow are identical, each floating-point comparator workflow
It is specific as follows:
Step 1 compares first input end input data and the second input terminal constant:
It according to floating point data format, adds mantissa and implies position, while extending a bit sign position;
As a result the mantissa that the second input terminal constant is subtracted with the mantissa of first input end input data is denoted as dm [13:0];
Step 2 compares the index of first input end input data and the second input terminal constant:
A bit sign position is extended, the index of the second input terminal constant is subtracted with the index of first input end input data, is as a result remembered
For de [4:0];
Step 3 carries out symbol decision to mantissa comparison result dm [13:0]:
Sdm:sdm=dm [13] xor dm [12] is calculated using following formula;
When sdm is 0, dm [13:0] is positive number;When sdm is 1, dm [13:0] is negative;
Step 4 carries out symbol decision to index comparison result de:
Sde:sde=de [4] xor dm [3] is calculated using following formula;
When sde is 0, de [4:0] is positive number;When sde is 1, de [4:0] is negative;
Step 5, comparison result judgement, if it is not 0 that sde, which is 0 and de [4:0], it is defeated that first input end input data is greater than second
Enter to hold constant, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 0, first input end input data is greater than the second input
Constant is held, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 1, first input end input data is less than the second input terminal
Constant, exporting is 1;If sde is 1, for first input end input data less than the second input terminal constant, exporting is 1.
11. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute
Stating parameter register includes four 32 tri-state control doors and 32 d type flip flop dff_15, j-th of 32 tri-state control doors
Control signal end join domain detecting module output range intervals id signal jth position, j=0,1,2,3;Four 32
The reset signal end of tri-state control door is all connected with externally input reset signal rst, the output end of four 32 tri-state control doors
With the data input pin d of 32 d type flip flop dff_15, the RESET input rst connection of 32 d type flip flop dff_15 is external defeated
The reset signal rst entered, the input end of clock outer mcycle of clk connection [2] of 32 d type flip flop dff_15,32 d type flip flops
High 16 of the output end q output of dff_15 are Monomial coefficient A [15:0], and low 16 are offset B [15:0].
12. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute
Stating look-up table includes 33 16 tri-state control doors, 33 two inputs or door, 16 phase inverters and 1 16 d type flip flop dff_
20;
It is preceding 17 two input or door the second input terminal with symbol decision module output symbol connect, it is preceding 17 two input or
Men Zhong, kth1The kth bit address value that the first input end link address generator of a two input or door generates, k1=0,1,
2 ... ... 16;
16 two inputs or door are corresponded with 16 phase inverters afterwards, latter 16 two input or the second input terminal of door with it is corresponding
Inverter output connection, the input terminal of 16 phase inverters with symbol decision module output symbol connect, latter 16 two
In input or door, kth2The kth that the first input end link address generator of a two input or door generates2- 16 bit address values, k2
=17,18,19 ... ... 32;
The output end of each two input or door is connected with the second input terminal of corresponding 16 tri-state control doors, and 33 16 three
State control door first input end and 16 d type flip flop dff_20 the RESET input rst with external input reset signal
Rst connection, data input pin with 16 d type flip flop dff_20 after the output end of 33 16 tri-state control doors links together
D connection, the input end of clock clk and nmcycle [2] of 16 d type flip flop dff_20 are connect, and 16 d type flip flop dff_20's is defeated
The corresponding tanh activation primitive value lut_data [15:0] of outlet q output input operational data.
13. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute
Stating the first gated latches includes the identical gating latch units of 16 structures;
P-th of gating latch units includes selector mux_1, reverser inv_2 and d type flip flop dff_22, selector mux_1's
A input terminal connects the pth position of floating-point adder calculated result, and the B input terminal of selector mux_1 connects opposite arithmetic unit and calculates knot
The pth position of fruit, the output end of selector mux_1 are connect with the d input terminal of d type flip flop dff_22, the input terminal of reverser inv_2
And the end AS of selector mux_1 is connect with the data symbol of symbol decision module output simultaneously, the output end of reverser inv_2
It is connect with the end BS of selector mux_1;The input end of clock clk and mcycle [6] of d type flip flop dff_22 is connect, d type flip flop
The RESET input rst of dff_22 is connect with externally input reset signal, the first choosing of output end output of d type flip flop dff_22
The pth position output data of logical latch, p=0,1,2,3 ... ... 15.
14. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute
Stating the second gated latches includes the identical gating latch units of 16 structures;
Q-th of gating latch units includes selector mux_2, reverser inv_2 ' and d type flip flop dff_23, selector mux_2
A input terminal connection and locating table output result pth position, the B input terminal of selector mux_2 connects the output of the first gated latches
As a result pth position, the output end of selector mux_2 are connect with the d input terminal of d type flip flop dff_23, and reverser inv_2's ' is defeated
Enter the end AS at end and selector mux_2 while being connect with configuration control signal M, the output end and selector of reverser inv_2 '
The end BS of mux_2 connects;The input end of clock clk of d type flip flop dff_23 and the cycle control signal of controller connect, D triggering
The RESET input rst of device dff_23 is connect with externally input reset signal, the output end output second of d type flip flop dff_23
The pth position output data of gated latches.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910041332.7A CN109816105B (en) | 2019-01-16 | 2019-01-16 | Configurable neural network activation function implementation device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910041332.7A CN109816105B (en) | 2019-01-16 | 2019-01-16 | Configurable neural network activation function implementation device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109816105A true CN109816105A (en) | 2019-05-28 |
CN109816105B CN109816105B (en) | 2021-02-23 |
Family
ID=66604394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910041332.7A Active CN109816105B (en) | 2019-01-16 | 2019-01-16 | Configurable neural network activation function implementation device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109816105B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414677A (en) * | 2019-07-11 | 2019-11-05 | 东南大学 | It is a kind of to deposit interior counting circuit suitable for connect binaryzation neural network entirely |
CN110610235A (en) * | 2019-08-22 | 2019-12-24 | 北京时代民芯科技有限公司 | Neural network activation function calculation circuit |
CN111047007A (en) * | 2019-11-06 | 2020-04-21 | 北京中科胜芯科技有限公司 | Activation function calculation unit for quantized LSTM |
CN112256094A (en) * | 2020-11-13 | 2021-01-22 | 广东博通科技服务有限公司 | Deep learning-based activation function device and use method thereof |
CN112734023A (en) * | 2021-02-02 | 2021-04-30 | 中国科学院半导体研究所 | Reconfigurable circuit applied to activation function of recurrent neural network |
TWI755043B (en) * | 2019-09-04 | 2022-02-11 | 美商聖巴諾瓦系統公司 | Sigmoid function in hardware and a reconfigurable data processor including same |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020066046A1 (en) * | 2000-10-24 | 2002-05-30 | Chin-Shuing Liu | Apparatus for directly connecting to the internet and method thereof |
TW201331855A (en) * | 2012-01-19 | 2013-08-01 | Univ Nat Taipei Technology | High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes |
CN105987775A (en) * | 2016-07-20 | 2016-10-05 | 天津理工大学中环信息学院 | Temperature sensor nonlinearity correction method and system based on BP neural network |
CN106022468A (en) * | 2016-05-17 | 2016-10-12 | 成都启英泰伦科技有限公司 | Artificial neural network processor integrated circuit and design method therefor |
CN107003989A (en) * | 2014-12-19 | 2017-08-01 | 英特尔公司 | For the distribution and the method and apparatus of Collaboration computing in artificial neural network |
CN107844439A (en) * | 2016-09-20 | 2018-03-27 | 三星电子株式会社 | Support the storage device and system and its operating method of command line training |
EP3343463A1 (en) * | 2016-12-31 | 2018-07-04 | VIA Alliance Semiconductor Co., Ltd. | Neural network unit with re-shapeable memory |
CN108564169A (en) * | 2017-04-11 | 2018-09-21 | 上海兆芯集成电路有限公司 | Hardware processing element, neural network unit and computer usable medium |
KR20180120009A (en) * | 2017-04-26 | 2018-11-05 | 광주과학기술원 | A stochastic implementation method of an activation function for an artificial neural network and a system including the same |
CN108781265A (en) * | 2016-03-30 | 2018-11-09 | 株式会社尼康 | Feature extraction element, Feature Extraction System and decision maker |
CN108885600A (en) * | 2016-03-16 | 2018-11-23 | 美光科技公司 | Use the device and method operated through compression and decompressed data |
-
2019
- 2019-01-16 CN CN201910041332.7A patent/CN109816105B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020066046A1 (en) * | 2000-10-24 | 2002-05-30 | Chin-Shuing Liu | Apparatus for directly connecting to the internet and method thereof |
TW201331855A (en) * | 2012-01-19 | 2013-08-01 | Univ Nat Taipei Technology | High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes |
CN107003989A (en) * | 2014-12-19 | 2017-08-01 | 英特尔公司 | For the distribution and the method and apparatus of Collaboration computing in artificial neural network |
CN108885600A (en) * | 2016-03-16 | 2018-11-23 | 美光科技公司 | Use the device and method operated through compression and decompressed data |
CN108781265A (en) * | 2016-03-30 | 2018-11-09 | 株式会社尼康 | Feature extraction element, Feature Extraction System and decision maker |
CN106022468A (en) * | 2016-05-17 | 2016-10-12 | 成都启英泰伦科技有限公司 | Artificial neural network processor integrated circuit and design method therefor |
CN105987775A (en) * | 2016-07-20 | 2016-10-05 | 天津理工大学中环信息学院 | Temperature sensor nonlinearity correction method and system based on BP neural network |
CN107844439A (en) * | 2016-09-20 | 2018-03-27 | 三星电子株式会社 | Support the storage device and system and its operating method of command line training |
EP3343463A1 (en) * | 2016-12-31 | 2018-07-04 | VIA Alliance Semiconductor Co., Ltd. | Neural network unit with re-shapeable memory |
CN108564169A (en) * | 2017-04-11 | 2018-09-21 | 上海兆芯集成电路有限公司 | Hardware processing element, neural network unit and computer usable medium |
KR20180120009A (en) * | 2017-04-26 | 2018-11-05 | 광주과학기술원 | A stochastic implementation method of an activation function for an artificial neural network and a system including the same |
Non-Patent Citations (2)
Title |
---|
CHE-WEI LIN,ET AL: "《A digital circuit design of hyperbolic tangent sigmoid function for neural networks》", 《2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS》 * |
吴成均等: "《面向神经网络加速器的近似加法器的电路设计》", 《航空科学技术》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414677A (en) * | 2019-07-11 | 2019-11-05 | 东南大学 | It is a kind of to deposit interior counting circuit suitable for connect binaryzation neural network entirely |
CN110610235A (en) * | 2019-08-22 | 2019-12-24 | 北京时代民芯科技有限公司 | Neural network activation function calculation circuit |
CN110610235B (en) * | 2019-08-22 | 2022-05-13 | 北京时代民芯科技有限公司 | Neural network activation function calculation circuit |
TWI755043B (en) * | 2019-09-04 | 2022-02-11 | 美商聖巴諾瓦系統公司 | Sigmoid function in hardware and a reconfigurable data processor including same |
CN111047007A (en) * | 2019-11-06 | 2020-04-21 | 北京中科胜芯科技有限公司 | Activation function calculation unit for quantized LSTM |
CN111047007B (en) * | 2019-11-06 | 2021-07-30 | 北京中科胜芯科技有限公司 | Activation function calculation unit for quantized LSTM |
CN112256094A (en) * | 2020-11-13 | 2021-01-22 | 广东博通科技服务有限公司 | Deep learning-based activation function device and use method thereof |
CN112734023A (en) * | 2021-02-02 | 2021-04-30 | 中国科学院半导体研究所 | Reconfigurable circuit applied to activation function of recurrent neural network |
CN112734023B (en) * | 2021-02-02 | 2023-10-13 | 中国科学院半导体研究所 | Reconfigurable circuit applied to activation function of cyclic neural network |
Also Published As
Publication number | Publication date |
---|---|
CN109816105B (en) | 2021-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109816105A (en) | A kind of configurable neural network activation primitive realization device | |
Liu et al. | A stochastic computational multi-layer perceptron with backward propagation | |
CN110058840A (en) | A kind of low-consumption multiplier based on 4-Booth coding | |
CN108537332A (en) | A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms | |
Qin et al. | A novel approximation methodology and its efficient vlsi implementation for the sigmoid function | |
CN108921292A (en) | Approximate calculation system towards the application of deep neural network accelerator | |
Wang et al. | Constructing higher-dimensional digital chaotic systems via loop-state contraction algorithm | |
Scholl | Multi-output functional decomposition with exploitation of don't cares | |
Zhang et al. | Base-2 softmax function: Suitability for training and efficient hardware implementation | |
Ajit et al. | FPGA based performance comparison of different basic adder topologies with parallel processing adder | |
Chen et al. | A cordic-based architecture with adjustable precision and flexible scalability to implement sigmoid and tanh functions | |
Raghuram et al. | Digital implementation of the softmax activation function and the inverse softmax function | |
Perri et al. | Parallel architecture of power‐of‐two multipliers for FPGAs | |
Pan et al. | A semi-tensor product based all solutions boolean satisfiability solver | |
Madenda et al. | New Approach of Signed Binary Numbers Multiplication and Its Implementation on FPGA | |
Dakhole et al. | Multi-digit quaternary adder on programmable device: Design & verification | |
Jalilvand et al. | Fuzzy-logic using unary bit-stream processing | |
US20210383264A1 (en) | Method and Architecture for Fuzzy-Logic Using Unary Processing | |
CN110458277A (en) | The convolution hardware configuration of configurable precision suitable for deep learning hardware accelerator | |
Hacene et al. | Efficient hardware implementation of incremental learning and inference on chip | |
Madoš et al. | Field Programmable Gate Array hardware accelerator of prime implicants generation for single-output Boolean functions minimization | |
Métivier | An algorithm for computing asynchronous automata in the case of acyclic non-commutation graphs | |
CN112949830B (en) | Intelligent inference network system and addition unit and pooling unit circuitry | |
Gonzalez-Guerrero et al. | Asynchronous Stochastic Computing | |
Chhabra et al. | A Design Approach for Mac Unit Using Vedic Multiplier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |