CN109816105A - A kind of configurable neural network activation primitive realization device - Google Patents

A kind of configurable neural network activation primitive realization device Download PDF

Info

Publication number
CN109816105A
CN109816105A CN201910041332.7A CN201910041332A CN109816105A CN 109816105 A CN109816105 A CN 109816105A CN 201910041332 A CN201910041332 A CN 201910041332A CN 109816105 A CN109816105 A CN 109816105A
Authority
CN
China
Prior art keywords
input
door
type flip
flip flop
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910041332.7A
Other languages
Chinese (zh)
Other versions
CN109816105B (en
Inventor
车德亮
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Microelectronic Technology Institute
Mxtronics Corp
Original Assignee
Beijing Microelectronic Technology Institute
Mxtronics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Microelectronic Technology Institute, Mxtronics Corp filed Critical Beijing Microelectronic Technology Institute
Priority to CN201910041332.7A priority Critical patent/CN109816105B/en
Publication of CN109816105A publication Critical patent/CN109816105A/en
Application granted granted Critical
Publication of CN109816105B publication Critical patent/CN109816105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Logic Circuits (AREA)

Abstract

The invention discloses a kind of configurable neural network activation primitive realization devices, including controller, symbol decision module, range detection module, parameter register, floating-point multiplier, floating-point adder, opposite number arithmetic unit, address generator, look-up table, the first gated latches and the second gated latches.The operation of sigmoid function and tanh function is realized by configuration control signal M, device realizes that structure is simple, designs using synchronised clock that, convenient for timing inspection and verifying, area is small, it is low in energy consumption, convenient for realizing on chip, enhance the practicability of Embedded Application;When calculating neural network activation primitive using the present invention, process flow is simple, easily controllable, improves the efficiency of neural network activation primitive calculating;The device can be according to tanh computational accuracy needs, and easily extended address generator block and look-up table means meet the needs of function precision transformation.Therefore the present invention is the ideal structure that embedded neural network processor activation primitive is realized.

Description

A kind of configurable neural network activation primitive realization device
Technical field
The present invention relates to a kind of configurable neural network activation primitive realization devices, belong to field of computer technology.
Background technique
Artificial intelligence technology application development is swift and violent in recent years.Various types of nerves especially based on machine learning Network structure achieves significant progress in application fields such as data mining, classification, image speech recognitions.By specific intelligence application The neural network hardware that training is completed is in break out what formula increased to improve the demand of built-in applied system intelligent processing capacity Situation.Activation primitive is the core of neuronal function in neural network.How efficient, succinct realization activation primitive has become mind One of key technical problem through network built-in type application.Activation primitive, which belongs to, to be surmounted function, the implementation of particular hardware It is the core technology of external block.
Summary of the invention
Technology of the invention solves the problems, such as: a kind of configurable neural network activation primitive realization device is provided, it is real Existing structure is simple, area is small, low in energy consumption, convenient for being realized on chip, when calculating neural network activation primitive using the present invention, and place It is simple to manage process, it is easily controllable, the efficiency of neural network activation primitive calculating is improved, the practical of Embedded Application is enhanced Property.
The technical solution of the invention is as follows:
A kind of configurable neural network activation primitive realization device, including controller, symbol decision module, range detection Module, parameter register, floating-point multiplier, floating-point adder, opposite number arithmetic unit, address generator, look-up table, the first gating Latch and the second gated latches;
Controller: it is generated according to the value of configuration control signal M and to be needed in the different entire operational data paths of activation primitive Signal done is completed in latch control signal and operation;
Symbol decision module: receiving the operational data of input, judges the positive and negative of the data, if it is canonical that the data are defeated Out to range detection module, address generator and floating-point multiplier;Otherwise the absolute value of the data is exported and gives range detection mould Block, address generator and floating-point multiplier, while the symbol of the data being exported to address generator and the first gated latches;
Range detection module: judging which section received data value is in, and marks to parameter register range of transmission section Know signal;
Parameter register: storage be used to approach sigmoid activation primitive linear function parameter, i.e., Monomial coefficient and Offset;The linear function that sigmoid activation primitive is approached according to the selection of range intervals id signal, exports to floating-point multiplier The Monomial coefficient of the linear function exports the offset of the linear function to floating-point adder;
Floating-point multiplier a: term system of the linear function that extraction scope detecting module exports from parameter register module Number, and the Monomial coefficient of the linear function and the product of data value are calculated, it exports to floating-point adder;
Floating-point adder: the offset for the linear function that extraction scope detecting module exports from parameter register module, The sum of product and offset from floating-point multiplier are calculated, and obtained result is exported and is gated to opposite arithmetic unit and first Latch;
Opposite arithmetic unit: the opposite number of floating-point adder output result is calculated, is exported to the first gated latches;
First gated latches: when the symbol of input operational data is timing, by the calculated result from floating-point adder Gating output is to the second gated latches, when the symbol for inputting operational data is negative, by the calculating knot from opposite arithmetic unit Fruit gating output is to the second gated latches;
Address generator: section locating for the data value exported according to symbol decision module generates 17 bit address values, as Look-up table index;
Look-up table: tanh activation primitive value corresponding to each data interval is stored, according to the lookup in address generator Table index searches the corresponding tanh activation primitive value of input operational data, exports to the second gated latches;
Second gated latches: according to the value of configuration control signal M, gating exports and latches sigmoid activation primitive The operation result of operation result or tanh activation primitive.
M=1 indicates the operation of sigmoid activation primitive, and controller generates the entire operational data path of sigmoid activation primitive 8 latch control signal mcycle [7:1] of middle needs;M=0 indicates the operation of tanh activation primitive, and controller generates tanh and swashs The 4 latch control signal nmcycle [3:1] needed in the entire operational data path of function living.
The controller includes d type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, D touching Send out device dff_4, d type flip flop dff_5, d type flip flop dff_6, d type flip flop dff_7, d type flip flop dff_8, with door and21_0, with Door and21_1, with door and21_2, with door and21_3, with door and21_4, with door and21_5, with door and21_6 and door And21_7, nor gates are inputted with door and21_8, with door and21_9, with door and21_10 or door or21_0, or21_1, two Nor21_0, reverser inv_0 and reverser inv_1;
D type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop dff_4, D touching Hair device dff_5, d type flip flop dff_6, d type flip flop dff_7 connect external input with the input end of clock clk of d type flip flop dff_8 The data input pin d connection " 1 " of clock signal clk, d type flip flop dff_0 and high potential, the output end q mono- of d type flip flop dff_0 Aspect exports cycle control signal cycle0, on the other hand connects the data input pin d of trigger dff_1;D type flip flop dff_1 Output end q connection d type flip flop dff_2 data input pin d;The output end q connection d type flip flop dff_3 of d type flip flop dff_2 Data input pin d;The data input pin d of the output end q connection d type flip flop dff_4 of d type flip flop dff_3;D type flip flop dff_4 Output end q connection d type flip flop dff_5 data input pin d;The output end q connection d type flip flop dff_6 of d type flip flop dff_5 Data input pin d;The data input pin d of the output end q connection d type flip flop dff_7 of d type flip flop dff_6;D type flip flop dff_7 Output end q connection d type flip flop dff_8 data input pin d;
The input terminal of reverser inv_0 connects configuration control signal M;The output end of reverser inv_0 connects simultaneously and door The first input end of and21_0, the first input end with the first input end of door and21_2 and with door and21_4, d type flip flop The output end q of dff_1 connects the first input end with the second input terminal of door and21_0 and with door and21_1, configuration control simultaneously Signal M processed is connected simultaneously with the second input terminal of the second input terminal of door and21_1 and door and21_3, with door and21_5's Second input terminal, second with the second input terminal of the second input terminal of door and21_6 and door and21_7 and door and21_8 Input terminal, the second input terminal with the second input terminal of door and21_9 and with door and21_10, the output of d type flip flop dff_2 Q is held to connect the first input end with the second input terminal of door and21_2 and with door and21_3 simultaneously, d type flip flop dff_3's is defeated Outlet q connects the first input end with the second input terminal of door and21_4 and with door and21_5 simultaneously, d type flip flop dff_4's Output end q connects the first input end with door and21_6 simultaneously, and the output end q of d type flip flop dff_5 is connected simultaneously and door The first input end of and21_7, the output end q of d type flip flop dff_6 connect the first input end with door and21_8, D touching simultaneously The output end q for sending out device dff_7 is connected simultaneously and the first input end of door and21_9, the output end q of d type flip flop dff_8 connect simultaneously Connect the first input end with door and21_10;
With the output end output signal nmcycle [1] of door and21_0, output end output signal with door and21_1 Mcycle [1], the output end output signal nmcycle [2] with door and21_2, the output end output signal with door and21_3 Mcycle [2], the output end output signal nmcycle [3] with door and21_4, the output end output signal with door and21_5 Mcycle [3], the output end output signal mcycle [4] with door and21_6, the output end output signal with door and21_7 Mcycle [5], the output end output signal mcycle [6] with door and21_8, the output end output signal with door and21_9 mcycle[7];
With the output end of door and21_6 with or the first input end of door or21_1 connect, the output end with door and21_10 With or door or21_1 the second input terminal connect or the output end of door or21_1 on the one hand output operation complete signal done, separately On the one hand it is connect with the first input end of two input nor gate nor21_0, the input terminal connection of reverser inv_1 is externally input Reset signal rst, the output end of reverser inv_1 with two input nor gate nor21_0 the second input terminal connect, two input or The output end of NOT gate nor21_0 connect simultaneously d type flip flop dff_0, dff_1, dff_2, dff_3, dff_4, dff_5, dff_6, The RESET input rst of dff_7 and dff_8;With the output end of door and21_5 with or the first input end of door or21_0 connect Connect, with the output end of door and21_9 with or the second input terminal of door or21_0 connect or the output end output signal of door or21_0 result_clk。
The symbol decision module includes data opposite number complement arithmetic device DD1,16 data latches and d type flip flop dff_9;
The input terminal d2 of the input terminal DIN of data opposite number complement arithmetic device DD1 and 16 data latches simultaneously with it is defeated The operational data connection entered, the output end DOUT of data opposite number complement arithmetic device DD1 and the input terminal of 16 data latches D1 connection;
The 11st data [11] for inputting operational data connects the control signal ctrl and D of 16 data latches simultaneously The data input pin d of trigger dff_9;
Cycle control signal cycle0 connects the input end of clock clk and d type flip flop dff_9 of 16 data latches simultaneously Input end of clock clk;Externally input reset signal rst connects the RESET input rst and D of 16 data latches simultaneously The RESET input rst of trigger dff_9;
The absolute value data_abs [15:0] of the output end d3 output input operational data of 16 data latches, D triggering The symbol neg of the output end q output input operational data of device dff_9.
The opposite arithmetic unit includes data opposite number complement arithmetic device DD2 and d type flip flop dff_21,
Data opposite number complement arithmetic device DD2 input terminal DIN connection floating-point adder output as a result, output end DOUT Meet the d of the data input pin of d type flip flop dff_21;
The input end of clock clk connection mcycle [5] of d type flip flop dff_21, the RESET input of d type flip flop dff_21 Rst is connect with external input reset signal rst, the phase of the output end q output floating-point adder output result of d type flip flop dff_21 Anti- number.
Data opposite number complement arithmetic device DD1 is identical with data opposite number complement arithmetic device DD2 implementation process, specifically such as Under:
Step 1, the note received data of input terminal DIN are DIN [15:0], calculate the opposite number complement code of DIN [15:0] mantissa D_MID [13:0], while step 2, step 3 and step 4 are carried out, D_MID [13:0] is judged;
Step 2 judges whether D_MID [13:0] is 0, if so, into step 5;
Step 3, judges whether D_MID [13:0] overflows, if D_MID [13] xorD_MID [12]=1 is spilling, if It overflows, then enters step 6;
Step 4 enters step 7 if D_MID [13] xorD_MID [12]=0;
Step 5, setting D_MID [17:14] is maximum negative value, to obtain 0 Correct method, while carrying out step 8, the 9 steps and step 10 judge D_MID [17:14];
Step 6, D_MID [13:0] move to right one, and index D _ MID [17:14]=DIN [15:12] plus 1, carry out simultaneously Step 8, step 9 and step 10 judge D_MID [17:14];
Step 7, D_MID [13:0] move to left K, and index D _ MID [17:14]=DIN [15:12] subtracts K, while carrying out the 8 steps, step 9 and step 10 judge D_MID [17:14];
Step 8, whether index of discrimination operation result D_MID [17:14] is positive to overflow, if it is, into step 11;
Step 9, whether negative sense overflows index of discrimination operation result D_MID [17:14], if it is, into step 12;
Step 10, whether index of discrimination operation result D_MID [17:14] is in range, if it is, into step 13;
Step 11, if D_MID [13:0] > 0, setting D_MID is floating number positive peak;If D_MID [13:0] < 0, sets D_MID is floating number negative peak, into step 13;
Step 12, it is -8 that setting D_MID [13:0], which is 0, D_MID [17:14], into step 13;
18 bit arithmetic data D_MID are removed sign-extension bit by step 13, are removed implicit position and are exported to get to DOUT sections 16 floating-point format data DOUT [15:0], wherein DOUT [15:12]=D_MID [17:14], DOUT [11]=D_MID [12], DOUT [10:0]=D_MID [10:0].
Range detection module includes floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, anti- Phase device inv_3, phase inverter inv_4, phase inverter inv_5, two input or door or21_2, three input or door or31_0, three input or Door or31_1, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_1, Second input terminal connects constant C0, and C0 is (5.0000)10
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_2, Second input terminal connects constant C1, and C1 is (1.0000)10
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_3, Second input terminal connects constant C2, and C2 is (1.0000)10
The output end of the floating-point comparator fcom_1 data input pin d and phase inverter with d type flip flop dff_11 simultaneously The input terminal of inv_3 connects, the output end of phase inverter inv_3 simultaneously with the first input end of two inputs or door or21_2, three defeated Enter or the first input end of door or31_0 and three input or door or31_1 first input end connection, floating-point comparator fcom_2 Output end connect simultaneously with the input terminal of phase inverter inv_4 and two inputs or the second input terminal of door or21_2;Two inputs Or the output end of door or21_2 is connect with the data input pin d of d type flip flop dff_12;The output end of phase inverter inv_4 simultaneously with The second input terminal connection of the second input terminal and three inputs or door or31_1 of three inputs or door or31_0;Floating-point comparator The output end of fcom_3 is connect with three inputs or the third input terminal of door or31_0 and the input terminal of phase inverter inv_5 simultaneously, The output end of phase inverter inv_5 is connect with the third input terminal of three inputs or door or31_1, three inputs or the output of door or31_0 End is connect with the data input pin d of d type flip flop dff_13, the output end and d type flip flop dff_14 of three inputs or door or31_1 Data input pin d connection;
The RESET input of d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14 Rst is connect with externally input reset signal rst, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and D touching The end input end of clock clk for sending out device dff_14 connects mcycle [1], d type flip flop dff_14 output end q output signal range [0], d type flip flop dff_13 output end q output signal range [1], d type flip flop dff_12 output end q output signal range [2], d type flip flop dff_11 output end q output signal range [3].
The address generator include four 4 bit address generation module addrgen1, addrgen2, addrgen3 and addrgen4;
The end cin of addrgen1 is grounded, the end cin of the end the cout connection addrgen2 of addrgen1, addrgen2's The end cout connects the end cin of addrgen3, the end cin of the end the cout connection addrgen4 of addrgen3;addrgen1, Addrgen2, addrgen3 connect nmcycle [1] with the end clk of addrgen4;Addrgen1, addrgen2, addrgen3 and The end rst of addrgen4 connects externally input reset signal rst, addrgen1, addrgen2, addrgen3 and addrgen4 D [15:0] end bound symbol judgment module output data;The d3 [15:0] of addrgen1 terminates constant (8)10It is corresponding floating Points, d2 [15:0] termination constant (7.5)10Corresponding floating number, d1 [15:0] termination constant (7)10Corresponding floating number, d0 [15:0] terminates constant (6.5)10Corresponding floating number, output end out [3:0] the output address addr [16:13] of addrgen1;
The d3 [15:0] of Addrgen2 terminates constant (6)10Corresponding floating number, d2 [15:0] termination constant (5.5)10It is right Floating number, d1 [15:0] termination constant (5) answered10Corresponding floating number, d0 [15:0] termination constant (4.5)10Corresponding floating-point Number, output end out [3:0] the output address addr [12:9] of addrgen1;
The d3 [15:0] of Addrgen3 terminates constant (4)10Corresponding floating number, d2 [15:0] termination constant (3.5)10It is right Floating number, d1 [15:0] termination constant (3) answered10Corresponding floating number, d0 [15:0] termination constant (2.5)10Corresponding floating-point Number, output end out [3:0] the output address addr [8:5] of addrgen1;
The d3 [15:0] of Addrgen4 terminates constant (2)10Corresponding floating number, d2 [15:0] termination constant (1.5)10It is right Floating number, d1 [15:0] termination constant (1) answered10Corresponding floating number, d0 [15:0] termination constant (0.5)10Corresponding floating-point The end cout of number, output end out [3:0] the output address addr [4:1] of addrgen1, addrgen4 exports addr [0].
4 bit address generation module addrgen1, addrgen2, addrgen3 and addrgen4 structures are identical, include floating Point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7, phase inverter inv_ 6, phase inverter inv_7, phase inverter inv_8, phase inverter inv_9, two inputs or door or21_3, three inputs or door or31_2, four defeated Enter or door or41_0, five input or door or51_0, five input or door or51_1, d type flip flop dff_16, d type flip flop dff_17, D Trigger dff_18 and d type flip flop dff_19;
Floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 are equal Have two input terminals;
The first input end of floating-point comparator fcom_4 is that the d [15:0] of 4 bit address generation modules is held, and the second input terminal is made It is held for the d3 [15:0] of 4 bit address generation modules;
The first input end of floating-point comparator fcom_5 is connect with the first input end of floating-point comparator fcom_4, and second is defeated The d2 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_6 is connect with the first input end of floating-point comparator fcom_4, and second is defeated The d1 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_7 is connect with the first input end of floating-point comparator fcom_4, and second is defeated The d0 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of two inputs or door or21_3, the first input end of three inputs or door or31_2, four inputs or door The first input end of or41_0, the first input end of five inputs or door or51_0 and five inputs or the first input of door or51_1 Hold the end cin after being connected as 4 bit address generation modules;
The output end of floating-point comparator fcom_4 inputs simultaneously with the input terminal of phase inverter inv_6 and two or door or21_3 The connection of the second input terminal, two inputs or the output end of door or21_3 and the input terminal d of d type flip flop dff_16 are connect;
The output end of floating-point comparator fcom_5 inputs simultaneously with the input terminal of phase inverter inv_7 and three or door or31_2 The connection of third input terminal, the output end of phase inverter inv_6 connect with three inputs or the second input terminal of door or31_2, and three input Or the output end of door or31_2 is connect with the input terminal d of d type flip flop dff_17;
The output end of floating-point comparator fcom_6 inputs simultaneously with the input terminal of phase inverter inv_8 and four or door or41_0 The connection of the 4th input terminal, the output end of phase inverter inv_6 is inputted with four or the second input terminal of door or41_0 is connect, phase inverter The output end of inv_7 is connect with the third input terminal of four inputs or door or41_0, and the output end and D of four inputs or door or41_0 touch Send out the input terminal d connection of device dff_18;
The output end of floating-point comparator fcom_7 inputs simultaneously with the input terminal of phase inverter inv_9 and five or door or51_0 The connection of the 5th input terminal, the output end of phase inverter inv_6 is inputted with five or the second input terminal of door or51_0 is connect, phase inverter The output end of inv_7 connect with five inputs or the third input terminal of door or51_0, the output end of phase inverter inv_8 and five input or The input terminal d of the 4th input terminal connection of door or51_0, the output end and d type flip flop dff_19 of five inputs or door or51_0 connects It connects;
The output end of phase inverter inv_6 is connect with the second input terminal of five inputs or door or51_1, and phase inverter inv_7's is defeated Outlet connect with five inputs or the third input terminal of door or51_1, and the output end of phase inverter inv_8 and five inputs or door or51_1 The connection of the 4th input terminal, the output end of phase inverter inv_9 connect with five inputs or the 5th input terminal of door or51_1, and five input Or cout end of the output end of door or51_1 as 4 bit address generation modules;
The RESET input of d type flip flop dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19 The end rst as 4 bit address generation modules after rst is connected, input end of clock clk as 4 bit address generate mould after being connected The end clk of block, d type flip flop dff_16 export the 3rd bit address, and d type flip flop dff_17 exports the 2nd bit address, d type flip flop dff_18 The 1st bit address is exported, d type flip flop dff_19 exports the 0th bit address.
Floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, floating-point comparator fcom_4, Floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 workflow are identical, each floating-point comparator Workflow is specific as follows:
Step 1 compares first input end input data and the second input terminal constant:
It according to floating point data format, adds mantissa and implies position, while extending a bit sign position;
As a result the mantissa that the second input terminal constant is subtracted with the mantissa of first input end input data is denoted as dm [13:0];
Step 2 compares the index of first input end input data and the second input terminal constant:
A bit sign position is extended, the index of the second input terminal constant is subtracted with the index of first input end input data, is tied Fruit is denoted as de [4:0];
Step 3 carries out symbol decision to mantissa comparison result dm [13:0]:
Sdm:sdm=dm [13] xor dm [12] is calculated using following formula;
When sdm is 0, dm [13:0] is positive number;When sdm is 1, dm [13:0] is negative;
Step 4 carries out symbol decision to index comparison result de:
Sde:sde=de [4] xor dm [3] is calculated using following formula;
When sde is 0, de [4:0] is positive number;When sde is 1, de [4:0] is negative;
Step 5, comparison result judgement, if it is not 0 that sde, which be 0 and de [4:0], first input end input data is greater than the Two input terminal constants, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 0, first input end input data is greater than second Input terminal constant, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 1, first input end input data is defeated less than second Enter to hold constant, exporting is 1;If sde is 1, for first input end input data less than the second input terminal constant, exporting is 1.
The parameter register includes four 32 tri-state control doors and 32 d type flip flop dff_15, j-th 32 The jth position of the range intervals id signal of the control signal end join domain detecting module output of tri-state control door, j=0,1,2, 3;The reset signal end of four 32 tri-state control doors is all connected with externally input reset signal rst, four 32 tri-state controls Data input pin d, the RESET input rst of 32 d type flip flop dff_15 of the output end of door with 32 d type flip flop dff_15 Connect externally input reset signal rst, the input end of clock outer mcycle of clk connection [2] of 32 d type flip flop dff_15,32 High 16 of the output end q output of position d type flip flop dff_15 are Monomial coefficient A [15:0], low 16 for offset B [15: 0]。
The look-up table includes 33 16 tri-state control doors, 33 two inputs or door, 16 phase inverters and 1 16 D Trigger dff_20;
Second input terminal of preceding 17 two inputs or door is connect with the symbol of symbol decision module output, and first 17 two defeated Enter or door in, kth1The kth bit address value that the first input end link address generator of a two input or door generates, k1=0,1, 2 ... ... 16;
16 two inputs or door are corresponded with 16 phase inverters afterwards, latter 16 two input or the second input terminal of door with Corresponding inverter output connection, the input terminal of 16 phase inverters are connect with the symbol of symbol decision module output, and rear 16 In a two input or door, kth2The kth that the first input end link address generator of a two input or door generates2- 16 bit address Value, k2=17,18,19 ... ... 32;
The output end of each two input or door is connected with the second input terminal of corresponding 16 tri-state control doors, and 33 16 The first input end of position tri-state control door and the RESET input rst of 16 d type flip flop dff_20 are resetted with external input Signal rst connection, the output end of 33 16 tri-state control doors are defeated with the data of 16 d type flip flop dff_20 after linking together Enter d is held to connect, the input end of clock clk and nmcycle [2] of 16 d type flip flop dff_20 is connect, 16 d type flip flop dff_20 The corresponding tanh activation primitive value lut_data [15:0] of output end q output input operational data.
First gated latches include the identical gating latch units of 16 structures;
P-th of gating latch units includes selector mux_1, reverser inv_2 and d type flip flop dff_22, selector The pth position of the A input terminal connection floating-point adder calculated result of mux_1, the B input terminal of selector mux_1 connect phase inverse operation The pth position of device calculated result, the output end of selector mux_1 are connect with the d input terminal of d type flip flop dff_22, reverser inv_2 Input terminal and selector mux_1 the end AS simultaneously with symbol decision module output data symbol connect, reverser inv_2 Output end connect with the end BS of selector mux_1;The input end of clock clk and mcycle [6] of d type flip flop dff_22 is connect, D The RESET input rst of trigger dff_22 is connect with externally input reset signal, the output end output of d type flip flop dff_22 The pth position output data of first gated latches, p=0,1,2,3 ... ... 15.
Second gated latches include the identical gating latch units of 16 structures;Q-th of gating latch units packet Include selector mux_2, reverser inv_2 ' and d type flip flop dff_23, the A input terminal connection and locating table output of selector mux_2 As a result pth position, the B input terminal of selector mux_2 connect the pth position of the first gated latches output result, selector mux_2 Output end connect with the d input terminal of d type flip flop dff_23, the input terminal of reverser inv_2 ' and the AS of selector mux_2 End is connect with configuration control signal M simultaneously, and the output end of reverser inv_2 ' is connect with the end BS of selector mux_2;D type flip flop The input end of clock clk of dff_23 and the cycle control signal of controller connect, the RESET input rst of d type flip flop dff_23 It is connect with externally input reset signal, the pth position that the output end of d type flip flop dff_23 exports the second gated latches exports number According to.
The invention has the following advantages that
(1) present invention uses computation of table lookup tanh functional value and linear function the Fitting Calculation sigmoid functional value, two kinds of letters Number calculation features simple structure;The present invention can share the operand of sigmoid function Yu tanh function according to timing control simultaneously The area of realization is reduced compared to independently realizing according to the partial function module in road;Sigmoid function and tanh letter Functional module in several operational data roads only when corresponding cycle control signal is effective, just there is dynamic power consumption, Qi Tashi It waits without dynamic power consumption, therefore integrated circuit is low in energy consumption;In addition, the present invention using synchronised clock design, convenient for timing inspection with test Card, enhances the practicability of Embedded Application.
(2) using the present invention calculate neural network activation primitive when, only need to one scheme control choosing then, clock and Reset signal, so that it may which the generation of control function value, control flow are simple;If only calculating Certain function summary using apparatus of the present invention Value, then the device, which is constituted, calculates assembly line, and the efficiency of neural network activation primitive calculating can be improved.
(3) present invention can be according to tanh computational accuracy needs, easily extended address generator block and look-up table means To meet the needs of function precision transformation.
Detailed description of the invention
Fig. 1 is a kind of 16 floating point data formats;
Fig. 2 is composed structure of the invention;
Fig. 3 is controller architecture;
Fig. 4 is symbol decision modular structure;
Fig. 5 is data opposite number complement arithmetic flow chart;
Fig. 6 is the structure that a gating latches, wherein (a) is one bit architecture schematic diagram of the first gated latches, it is (b) the Two gated latches, one bit architecture schematic diagram;
Fig. 7 is the structure of range detection module;
Fig. 8 is floating-point comparator flow chart;
Fig. 9 is the parameter register structure of 4 32 word lengths;
Figure 10 is floating-point multiplier operational flowchart;
Figure 11 is floating-point adder operational flowchart;
Figure 12 is address generator structure;
Figure 13 is 4 bit address generation module structures;
Figure 14 is to look for table structure;
Figure 15 is the structure of opposite arithmetic unit.
Specific embodiment
For a clearer understanding of the present invention, below in conjunction with attached drawing, the present invention is described in further detail.
The characteristics of according to Neural Network Data operation, handled data are determined using following data format in the present invention Justice: data length is 16 bits, and data are floating number;[15:12] 4 is the index of floating number in 16 bits Area, the 11st is sign bit;One implicit position is lain between [11:10];[10:0] is the fractional part of floating number;16 Floating number each section is indicated with the complement of two's two's complement.As shown in Figure 1.It implicit position between in place 11 and position 10 will be in operation One binary system point is added to arithmetic unit position to complete to operate, when the is-not symbol position of this highest order will show, Position is in close proximity to the left side of binary point, and floating-point binary complement code x is provided by following form in this floating-point format:
X=01.f × 2eIf s=0
X=10.f × 2eIf s=0
X=0 if e=-8
In this short floating-point format 0 must be indicated using retention given below:
E=-8
S=0
F=0
The range and precision of the floating-point format:
Maximum positive: x=(2-2-11)×27=2.5594 × 102
Minimum positive number: x=1 × 2-7=7.8125 × 10-3
Minimum negative: x=(- 1-2-11)×2-7=-7.8163 × 10-3
Maximum negative: x=-2 × 2-7=-2.5600 × 102
On the basis of the above, the present invention proposes the realization device of a configurable neural network activation primitive, such as Fig. 2 institute Show, is mainly made of a controller and sigmoid function and the operational data path of tanh function.Entirely configurable nerve The realization device of network activation function has 4 input signals: external input clock signal clk, external input reset signal rst, Configuration control signal M, activation primitive independent variable data value DATA [15:0] (operational data inputted, this is floating for 16 Point data), floating-point format is as indicated earlier;2 output signals: (this is 16 to the operation result result [15:0] of activation primitive Position floating data), activation primitive operation complete status signal DONE.
When carrying out the operation of sigmoid activation primitive, configuration control signal M is configured to 1;Configuration control signal M and outside Input clock signal clk, external input reset signal rst are input to controller, generate and complete sigmoid activation primitive operand According to control signal cycle0, mcycle [6:1] and result_clk of 8 clock cycle in path;Sigmoid activation primitive Operational data path is by symbol decision module, range detection module, parameter register, floating-point multiplier, floating-point adder, opposite Number arithmetic unit, the first gated latches, the second gated latches composition.Sigmoid activation primitive calculating process is as follows: The data value DATA [15:0] of sigmoid activation primitive independent variable initially enters symbol decision module, judges in the module defeated Enter the positive and negative situation of the value of operational data, if input data value is that canonical exports the data, otherwise exports the opposite of the data Number, the i.e. absolute value of the data, result are data_abs [15:0] and export the symbol neg, neg of DATA to be 1 expression DATA For negative, neg is that 0 expression DATA is positive number, and data_abs [15:0] and neg signal are by sigmoid activation primitive Clock cycle control signal cycle0 be responsible for latch;The output data_abs [15:0] of symbol decision module is input to range spy Survey module, the value range of data_abs [15:0] is divided into 4 numerical intervals, range detection be exactly determine data_abs [15: 0] in which positioned numerical intervals, the section according to locating for data value, then determine to estimate using the linear function relatively approached The value of sigmoid activation primitive, it is determined that the register of linear function coefficient and offset is also formed while value interval Address rang [3:0], cycle control signal mcycle [1] are responsible for the register address rang of linear function coefficient and offset The latch of [3:0] output;Rang [3:0] is output to parameter register, for selecting to approach a letter of sigmoid activation primitive Several parameters, i.e. Monomial coefficient and offset, parameter register bit wide 32, wherein high 16 for Monomial coefficient A [15: 0], low 16 are offset B [15:0], and cycle control signal mcycle [2] is responsible for the output of parameter register, i.e. A [15:0] Output with B [15:0] is latched;A [15:0] and data_abs [15:0] is input to floating-point multiplier, carries out floating-point multiplication calculating, Cycle control signal mcycle [3] is responsible for the latch of the output AX [15:0] of floating-point multiplier;The output AX of floating-point multiplier The output B [15:0] of [15:0] and parameter register is input to floating-point adder, carries out floating add calculating, cycle control signal Mcycle [4] is responsible for the latch of the output Y [15:0] of floating-point adder;The output Y [15:0] of floating-point adder is input on the contrary Arithmetic unit, the value NY [15:0] of the opposite number for calculating Y [15:0], cycle control signal mcycle [5] are responsible for phase inverse operation The latch of the output NY [15:0] of device;The output NY [15:0] of opposite arithmetic unit and the output Y [15:0] of floating-point adder are inputted To the first gated latches, output is gated according to the data symbol neg of data symbol judgment module output, when neg=0 gating is defeated The operation result Y [15:0] of floating-point adder out, when the result NY [15:0] of neg=1 gating output phase inverse operation device, period control Signal mcycle [6] processed is responsible for the latch of the output sigmoid_result [15:0] of the first gated latches;First logical latch The output sigmoid_result [15:0] of device is input to the second gated latches, according to the value of configuration control signal M, gates defeated Out and the operation result of sigmoid activation primitive or the operation result of tanh activation primitive are latched, the second gating as M=1 Latch gating output sigmoid_result [15:0], cycle control signal result_clk are responsible for the second gated latches Export the output of result [15:0], the i.e. output of the operation result of sigmoid activation primitive;Controller exports DONE, and (height has Effect), the operation of characterization sigmoid activation primitive is completed.When carrying out the operation of tanh activation primitive, configuration control signal M configuration It is 0, configuration control signal M and external input clock signal clk and reset signal rst are input to controller, generate completion Control signal cycle0, nmcycle [2:1] and result_ of 4 clock cycle in tanh activation primitive operational data path clk;The activation primitive operational data path tanh is by symbol decision module, address generating module, look-up table means, gated latches 2 compositions.Tanh activation primitive calculating process is as follows: the data value DATA of tanh activation primitive independent variable initially enters symbol decision Module, the positive and negative situation of the value of judgement input operational data in the module, if input data value is that canonical exports the data, Otherwise the value i.e. absolute value of the data of the opposite number of the data is exported, result is data_abs [15:0] and exports DATA's Symbol neg, neg are that 1 expression DATA is negative, and neg is that 0 expression DATA is positive number, and data_abs [15:0] and neg believes Number latch is responsible for by the clock cycle control signal cycle0 of tanh activation primitive;The output data_abs of symbol decision module [15:0] is input to address generating module, and the value of data_abs [15:0] is divided into 2n(n is according to tanh function essence for a numerical intervals Depending on degree requires, 4) this specification n takes, and address, which generates, needs which numerical intervals at data_abs [15:0] quickly determined In, the section according to locating for data value forms and searches table address addr [17:0], and cycle control signal nmcycle [1] is responsible for looking into The latch for looking for table address addr [17:0] to export;Addr [17:0] is output to look-up table, stores in a lookup table for selecting The functional value of tanh activation primitive, cycle control signal nmcycle [2] are responsible for the output lock of the functional value of ginseng tanh activation primitive It deposits;The functional value tanh_result [15:0] of tanh activation primitive is input to the second gated latches, the second gated latches root It exports and latches according to the value gating of configuration control signal M, the second gated latches gating output tanh activation primitive as M=0 Functional value, cycle control signal result_clk be responsible for the second gated latches output, result [15:0] i.e. tanh swash The output of the operation result of function living.Controller exports DONE (Gao Youxiao), and the operation of characterization tanh activation primitive is completed.With reference to Symbol decision module and the second gated latches are two sharp in the operational data path of Fig. 1, sigmoid function and tanh function The module shared in function data path living.
The input signal of controller (refer to Fig. 2, Fig. 3) is configuration control signal M and external input clock signal clk, outer Portion input reset signal rst, output signal be cycle control signal cycle0, mcycle [6:1], nmcycle [3:1], Signal DONE is completed in result_clk and operation.Its function are as follows: different activation primitives are generated according to the value of configuration control signal m Operational data path cycle control signal and activation primitive operation complete beacon signal DONE;As M=0, controller Generate 4 latch control signals in tanh activation primitive operational data path;As M=1, controller generates sigmoid activation 8 latch control signals of functional operation data path.
For controller mainly by 9 d type flip flop structure compositions, every d type flip flop structure is identical, the structure of controller such as Fig. 3 Shown, controller includes d type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop Dff_4, d type flip flop dff_5, d type flip flop dff_6, d type flip flop dff_7, d type flip flop dff_8 and door and21_0 and door And21_1 and door and21_2 and door and21_3 and door and21_4 and door and21_5 and door and21_6 and door And21_7, nor gates are inputted with door and21_8, with door and21_9, with door and21_10 or door or21_0, or21_1, two Nor21_0, reverser inv_0 and reverser inv_1.D type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, D triggering Device dff_3, d type flip flop dff_4, d type flip flop dff_5, d type flip flop dff_6, d type flip flop dff_7 and d type flip flop dff_8 when The data input pin d connection " 1 " of clock input terminal clk connection external input clock signal clk, d type flip flop dff_0 and high potential, D On the one hand output cycle control signal cycle0, another aspect connect the number of trigger dff_1 to the output end q of trigger dff_0 According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_2 of d type flip flop dff_1;D type flip flop dff_2's is defeated The data input pin d of outlet q connection d type flip flop dff_3;The number of the output end q connection d type flip flop dff_4 of d type flip flop dff_3 According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_5 of d type flip flop dff_4;D type flip flop dff_5's is defeated The data input pin d of outlet q connection d type flip flop dff_6;The number of the output end q connection d type flip flop dff_7 of d type flip flop dff_6 According to input terminal d;The output end q connection d type flip flop dff_8 data input pin d of d type flip flop dff_7;
The input terminal of reverser inv_0 connects configuration control signal M;The output end of reverser inv_0 connects simultaneously and door The first input end of and21_0, the first input end with the first input end of door and21_2 and with door and21_4, d type flip flop The output end q of dff_1 connects the first input end with the second input terminal of door and21_0 and with door and21_1, configuration control simultaneously Signal M processed is connected simultaneously with the second input terminal of the second input terminal of door and21_1 and door and21_3, with door and21_5's Second input terminal, second with the second input terminal of the second input terminal of door and21_6 and door and21_7 and door and21_8 Input terminal, the second input terminal with the second input terminal of door and21_9 and with door and21_10, the output of d type flip flop dff_2 Q is held to connect the first input end with the second input terminal of door and21_2 and with door and21_3 simultaneously, d type flip flop dff_3's is defeated Outlet q connects the first input end with the second input terminal of door and21_4 and with door and21_5 simultaneously, d type flip flop dff_4's Output end q connects the first input end with door and21_6 simultaneously, and the output end q of d type flip flop dff_5 is connected simultaneously and door The first input end of and21_7, the output end q of d type flip flop dff_6 connect the first input end with door and21_8, D touching simultaneously The output end q for sending out device dff_7 is connected simultaneously and the first input end of door and21_9, the output end q of d type flip flop dff_8 connect simultaneously Connect the first input end with door and21_10;
With the output end output signal nmcycle [1] of door and21_0, output end output signal with door and21_1 Mcycle [1], the output end output signal nmcycle [2] with door and21_2, the output end output signal with door and21_3 Mcycle [2], the output end output signal nmcycle [3] with door and21_4, the output end output signal with door and21_5 Mcycle [3], the output end output signal mcycle [4] with door and21_6, the output end output signal with door and21_7 Mcycle [5], the output end output signal mcycle [6] with door and21_8, the output end output signal with door and21_9 mcycle[7];
With the output end of door and21_6 with or the first input end of door or21_1 connect, the output end with door and21_10 With or door or21_1 the second input terminal connect or the output end of door or21_1 on the one hand output operation complete signal done, separately On the one hand it is connect with the first input end of two input nor gate nor21_0, the input terminal connection of reverser inv_1 is externally input Reset signal rst, the output end of reverser inv_1 with two input nor gate nor21_0 the second input terminal connect, two input or The output end of NOT gate nor21_0 connect simultaneously d type flip flop dff_0, dff_1, dff_2, dff_3, dff_4, dff_5, dff_6, The RESET input rst of dff_7 and dff_8;With the output end of door and21_5 with or the first input end of door or21_0 connect Connect, with the output end of door and21_9 with or the second input terminal of door or21_0 connect or the output end output signal of door or21_0 result_clk。
The input signal of symbol decision module is input data DATA [15:0], cycle control signal cycle0, resets letter Number rst;Output signal is the positive and negative identifier neg of data, data absolute value data_abs [15:0].Its function are as follows: work as input When the 11st [11]=0 DATA of sign bit of data, input data is positive, and the data are latched when cycle0 is effective and are exported;When When [11]=1 DATA, input data is negative, and the complemented value of the opposite number of the data and output are latched when cycle0 is effective;Input The sign bit DATA [11] of data, latches and is exported when cycle0 is effective as neg;Reset signal rst can will be exported when effective Signal data_abs [15:0] and neg are set to 0.
The structure of symbol decision module is as shown in figure 4,16 data of input connect data opposite number complement arithmetic device DD1's The D2 input terminal of input terminal DIN and 16 data latches;The output end DOUT of data opposite number complement arithmetic device D1 connects 16 The D1 input terminal of data latches;Input 16 data the 11st DATA [11] connect latch control signal ctrl and The data input pin d of d type flip flop dff_9;Input cycle control signal cycle0 connects input end of clock clk and the D touching of latch Send out the input end of clock clk of device dff_9;External input reset signal rst connects the RESET input of 16 data latches The RESET input rst of rst and d type flip flop dff_9.
The operational flowchart (referring to Fig. 5) of data opposite number complement arithmetic device is to realize 16 floating data DIN The complement code of [15:0] opposite number needs 13 steps.
Step 1 seeks the opposite number complement code of 16 floating number DIN [15:0] mantissa.According to the rule of floating point arithmetic, 16 Floating number DIN [15:0] will be expanded into 18 operational data D [17:0], and 4 index D IN [15:12] bit wides of DIN are not Become, corresponding D [17:14], 12 mantissa DIN [11:0] of DIN are extended to 14 D [13:0], wherein D [10:0]=DIN [10: 0], D [11]=~DIN [11] is the implicit position being added in floating-point format, and D [12]=DIN [11] is the sign bit of mantissa, D [13]=DIN [11] is mantissa's symbol Bits Expanding one.After the completion of Data expansion, D [13:0] step-by-step is negated, and adds in lowest order 1, operation result is D_MID [13:0].Step 2, step 3 and step 4 are carried out simultaneously, D_MID [13:0] is judged;
Step 2 is to judge whether the mantissa operation result D_MID [13:0] of operational data is 0, if so, into the 5th Step.
Step 3 is to judge whether mantissa operation result D_MID [13:0] of operational data overflows, if D_MID [13] XorD_MID [12]=1 is to overflow, if overflowing, enters step 6.
Step 4, D_MID [13:0] is non-spill, i.e. when D_MID [13] [12]=0 xorD_MID, then enters step 7;
Step 5, when mantissa operation result D_MID [13:0] is 0, setting D_MID [17:14] is maximum negative value, to obtain 0 Correct representation, while step 8, step 9 and step 10 are carried out, D_MID [17:14] is judged.
Step 6, when mantissa operation result D_MID [13:0] overflows, mantissa number D_MID [13:0] moves to right one and index DIN [15:12] plus 1, while step 8, step 9 and step 10 are carried out, D_MID [17:14] is judged.
Step 7, is processing mode when mantissa operation result D_MID [13:0] is non-spill, and mantissa number D_MID [13:0] is left It moves K and index D IN [15:12] and subtracts K, while carrying out step 8, step 9 and step 10, D_MID [17:14] is sentenced It is disconnected.
Step 8, is whether index of discrimination operation result D_MID [17:14] is positive spilling, if it is, into the 11st Step.
Step 9, is whether index of discrimination operation result D_MID [17:14] is that negative sense overflows, if it is, into the 12nd Step.
Whether step 10 is index of discrimination operation result D_MID [17:14] in data range, if it is, into 13 steps.
Step 11, if D_MID [13:0] > 0, setting D_MID is floating number positive peak;If D_MID [13:0] < 0, sets D_MID is floating number negative peak, into step 13.
Step 12, it is -8 that setting D_MID [13:0], which is 0, D_MID [17:14], into step 13.
18 bit arithmetic data D_MID are converted to 16 floating-point format data, remove sign-extension bit, remove by step 13 Implicit position, that is, export 16 floating-point format data DOUT [15:0], wherein DOUT [15:12]=D_MID [17:14], DOUT [11]=D_MID [12], DOUT [10:0]=D_MID [10:0].
First gated latches and the second gated latches are 16 parallel-by-bit structures.
First gated latches include the identical gating latch units of 16 structures.
P-th of gating latch units includes selector mux_1, reverser inv_2 and d type flip flop dff_22, selector The pth position of the A input terminal connection floating-point adder calculated result of mux_1, the B input terminal of selector mux_1 connect phase inverse operation The pth position of device calculated result, the output end of selector mux_1 are connect with the d input terminal of d type flip flop dff_22, reverser inv_2 Input terminal and selector mux_1 the end AS simultaneously with symbol decision module output data symbol connect, reverser inv_2 Output end connect with the end BS of selector mux_1;The input end of clock clk and mcycle [6] of d type flip flop dff_22 is connect, D The RESET input rst of trigger dff_22 is connect with externally input reset signal, the output end output of d type flip flop dff_22 The pth position output data of first gated latches, p=0,1,2,3 ... ... 15.In Fig. 6 shown in (a).
Its function are as follows: as the data symbol neg=0 of symbol decision module output, select D1 [p], locked when clk is effective It deposits gated data and exports D3 [p];It as neg=1, selects D2 [p], gated data is latched when clk is effective and exports D3 [p]。
Second gated latches include the identical gating latch units of 16 structures.
Q-th of gating latch units includes selector mux_2, reverser inv_2 ' and d type flip flop dff_23, selector The first gating of B input terminal connection of the pth position of the A input terminal connection and locating table output result of mux_2, selector mux_2 latches Device exports the pth position of result, and the output end of selector mux_2 is connect with the d input terminal of d type flip flop dff_23, reverser inv_ The end AS of 2 ' input terminal and selector mux_2 connects with configuration control signal M simultaneously, the output end of reverser inv_2 ' and The end BS of selector mux_2 connects;The input end of clock clk of d type flip flop dff_23 and the cycle control signal of controller connect, The RESET input rst of d type flip flop dff_23 is connect with externally input reset signal, and the output end of d type flip flop dff_23 is defeated The pth position output data of second gated latches out.In Fig. 6 shown in (b).
The input signal of range detection module are as follows: timing control signal mcycle [1], reset signal rst, 16 floating numbers According to data_abs;Output signal is 4 range signal range [3:0].Its function are as follows: reset signal is effective, sets constant C0 and is (5.0000)10, constant C1 be (1.0000)10, constant C2 be (1.0000)10;Input data by floating-point comparator simultaneously with C0, C1, C2 are compared;When the floating data data_abs of input is greater than C0, output range [3] is that 0, range [2] are 1, range [1] is that 1, range [0] is 1;When the floating data data_abs of input is greater than C1, output range [2] is 0, It is 1, range [0] is 1 that range [3], which is 1, range [1],;When the floating data data_abs of input is greater than C2, output It is 1, range [2] be 1, range [0] is 1 that range [1], which is 0, range [3],;If floating data data_abs be less than C0, When C1, C2, range [3] output be 1, range [2] output be 1, range [1] export be 1, range [0] export be 0.Range Detecting module structure such as Fig. 7: including floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, reverse phase Device inv_3, phase inverter inv_4, phase inverter inv_5, two inputs or door or21_2, three inputs or door or31_0, three inputs or door Or31_1, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_1, Second input terminal connects constant C0, and C0 is (5.0000)10
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_2, Second input terminal connects constant C1, and C1 is (1.0000)10
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_3, Second input terminal connects constant C2, and C2 is (1.0000)10
The output end of the floating-point comparator fcom_1 data input pin d and phase inverter with d type flip flop dff_11 simultaneously The input terminal of inv_3 connects, the output end of phase inverter inv_3 simultaneously with the first input end of two inputs or door or21_2, three defeated Enter or the first input end of door or31_0 and three input or door or31_1 first input end connection, floating-point comparator fcom_2 Output end connect simultaneously with the input terminal of phase inverter inv_4 and two inputs or the second input terminal of door or21_2;Two inputs Or the output end of door or21_2 is connect with the data input pin d of d type flip flop dff_12;The output end of phase inverter inv_4 simultaneously with The second input terminal connection of the second input terminal and three inputs or door or31_1 of three inputs or door or31_0;Floating-point comparator The output end of fcom_3 is connect with three inputs or the third input terminal of door or31_0 and the input terminal of phase inverter inv_5 simultaneously, The output end of phase inverter inv_5 is connect with the third input terminal of three inputs or door or31_1, three inputs or the output of door or31_0 End is connect with the data input pin d of d type flip flop dff_13, the output end and d type flip flop dff_14 of three inputs or door or31_1 Data input pin d connection;
The RESET input of d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14 Rst is connect with externally input reset signal rst, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and D touching The end input end of clock clk for sending out device dff_14 connects mcycle [1], d type flip flop dff_14 output end q output signal range [0], d type flip flop dff_13 output end q output signal range [1], d type flip flop dff_12 output end q output signal range [2], d type flip flop dff_11 output end q output signal range [3].
Floating-point comparator flow chart (referring to Fig. 8) is to realize a 16 digit data_abs [15:0] and 1 carry digit Ci The floating-point comparator of [2:0] needs 5 steps.
Step 1 compares the mantissa of first input end input data data_abs and the second input terminal constant Ci, i, wherein i =0,1,2:
It according to floating point data format, adds mantissa and implies position, while extending a bit sign position;
The mantissa of Ci is subtracted with the mantissa of data_abs, is as a result denoted as dm [13:0];
Step 2 compares the index of first input end input data data_abs and the second input terminal constant Ci:
A bit sign position is extended, the index of constant Ci is subtracted with the index of data_abs, is as a result denoted as de [4:0];
Step 3 carries out symbol decision to mantissa comparison result dm [13:0]:
Sdm:sdm=dm [13] xor dm [12] is calculated using following formula;
When sdm is 0, dm [13:0] is positive number;When sdm is 1, dm [13:0] is negative;
Step 4 carries out symbol decision to index comparison result de:
Sde:sde=de [4] xor dm [3] is calculated using following formula;
When sde is 0, de [4:0] is positive number;When sde is 1, de [4:0] is negative;
Step 5, comparison result judgement, if it is not 0 that sde, which is 0 and de [4:0], first input end input data data_ Abs is greater than the second input terminal constant Ci, and exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 0, first input end input Data data_abs is greater than the second input terminal constant Ci, and exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 1, first is defeated Enter to hold input data data_abs less than the second input terminal constant Ci, exporting is 1;If sde is 1, first input end inputs number According to data_abs less than the second input terminal constant Ci, exporting is 1.
The input signal of 32 parameter registers (referring to Fig. 2, Fig. 9) of 4 word lengths is range intervals id signal range [3:0], clock signal clk, reset signal rst, output signal are to approach the primary phase coefficient of the linear function of sigmoid function A [15:0] and offset B [15:0].Its function are as follows: when rst is effective, parameter register set, 32 ginsengs of 4 word lengths Each word of number register is set to 4 groups of letters corresponding to 4 demarcation intervals of the independent variable of sigmoid function input respectively Several Monomial coefficients and offset is followed successively by (A, B) according to the corresponding sequence of range intervals id signal range [3:0]10: (0.0000,0.0000)10;(0.0313,0.8438)10;(0.1250,0.6250)10;(0.2500,0.5000)10.According to range Section id signal range selects corresponding 32 words, for approach sigmoid activation primitive linear function parameter, That is Monomial coefficient A [15:0] and offset B [15:0], and believed according to clock and latch output.Parameter register includes four 32 Position tri-state control door and 32 d type flip flop dff_15, the control signal end join domain of j-th of 32 tri-state control doors are visited The jth position of the range intervals id signal of survey module output, j=0,1,2,3;The reset signal end of four 32 tri-state control doors It is all connected with externally input reset signal rst, the output end of four 32 tri-state control doors is with 32 d type flip flop dff_15's Data input pin d, the externally input reset signal rst of the RESET input rst connection of 32 d type flip flop dff_15,32 D touchings The input end of clock outer mcycle of clk connection [2] of device dff_15 is sent out, the output end q of 32 d type flip flop dff_15 exports high by 16 Position is Monomial coefficient A [15:0], and low 16 are offset B [15:0].
The operational flowchart (referring to Figure 10) of floating-point multiplier is to realize two 16 floating data A [15:0] and data_ Abs [15:0] multiplying needs 5 steps.
Step 1, operational data prepare, by 16 floating-point format data of A [15:0] be converted to 18 bit arithmetic data AIN [17: 0], conversion process is as follows:
AIN [17:14]=A [15:12], AIN [13]=A [11], AIN [12]=A [11], AIN [11]=~A [11], AIN [10:0]=A [10:0];
16 floating-point format data of data_abs [15:0] are converted into 18 bit arithmetic data XIN [17:0] simultaneously, if Range [3]=1, then the format of XIN [17:0] is as follows: XIN [17:14]=data_abs [15:12], XIN [13]= Data_abs [11], XIN [12]=data_abs [11], XIN [11]=~data_abs [11], XIN [10:0]=data_ abs[10:0]};If range [3]=0, XIN [17:0] is set to constant 1.
Step 2, data multiplication operation, AIN with XIN mantissa are multiplied, and obtain AXIN mantissa: AXIN [27:0]=AIN [13:0] ×XIN[13:0];
AIN XIN index is added, and obtains AXIN index: AXIN [31:28]=AIN [17:14]+XIN [17:14].
Step 3, multiplication mantissa operation result AXIN [27:0] determines, if AXIN [27:14]=0, AXIN [31:28] =-8;If AXIN [27:14] > > 1 be normalized number, { AXIN [27:14] } > > 1, AXIN [31:28]=AXIN [31:28] +1};If AXIN [27:14] > > 2 be normalized number,
{ AXIN [27:14] > > 2, AXIN [31:28]=AXIN [31:28]+2 }, AXIN [27:14] are normalized number.
Step 4, multiplication exponent arithmetic result AXIN [31:28] determine, if AXIN [31:28] overflow and AXIN [27: 14] > 0, then { AXIN [31:28]=7 };If AXIN [31:28] overflow and
AXIN [27:14] < 0, then { AXIN [31:28]=- 8 };If AXIN [31:28] underflow,
{ AXIN [31:28]=- 8, AXIN [27:14]=0 }, AXIN [31:28] is in value range.
Step 5, multiplication result latch output, if rst=1 and clk rise effectively,
{ AX [15:12]=AXIN [31:28], AX [11]=AXIN [27], AX [10:0]=AXIN [25:15] }, output As a result: AX [15:0]=A [15:0] × data_abs [15:0].
The operational flowchart (referring to Figure 11) of floating-point adder be realize two 16 floating data AX [15:0] and B [15: 0] add operation needs 7 steps.
Step 1, operational data prepare, and 16 floating-point format data of AX [15:0] are converted to 18 bit arithmetic data AXIN [17:0], conversion regime is as follows: AXIN [17:14]=AX [15:12], AXIN [13]=AX [11], AXIN [12]=AX [11], AXIN [11]=~AX [11], AXIN [10:0]=AX [10:0].
16 floating-point format data of B [15:0] are converted into 18 bit arithmetic data BIN [17:0], conversion regime is as follows: BIN [17:14]=B [15:12], BIN [13]=B [11], BIN [12]=B [11], BIN [11]=~B [11], BIN [10:0]=B [10:0]。
Step 2 calculates the index difference of two floating-point operation data, it may be assumed that N=AXIN [17:14]-BIN [17:14].
Two floating-point operation data are carried out mantissa to rank according to index difference by step 3, it may be assumed that if N > 0, BIN [13: 0]=BIN [13:0] > > N YIN [17:14]=AXIN [17:14] };Otherwise AXIN [13:0]=AXIN [13:0] > > | N | YIN [17:14]=BIN [17:14] }.
Two floating-point operation data mantissa are carried out add operation by step 4, it may be assumed that YIN=AXIN [13:0]+BIN [13: 0]。
Step 5, mantissa adder operation result YIN [13:0] judgement, if YIN [13:0]=0, YIN [17:14]=- 8;If YIN [13:0] overflows, { YIN [13:0]=YIN [13:0] > > 1, YIN [17:14]=YIN [17:14]+1 };If YIN [13:0] < < K becomes normalized number, then { YIN [13:0]=YIN [13:0] < < K, YIN [17:14]=YIN [17:14]- K}。
6th step, add operation result exponent YIN [17:14] determines, if YIN [17:14] overflow and YIN [13:0] > 0 Then { YIN [17:14]=7 };If YIN [17:14] overflow and YIN [13:0] < 0, { YIN [17:14]=- 8 };If YIN [17:14] underflow, then { YIN [17:14]=- 8, YIN [13:0]=0 }, YIN [17:14] is in value range.
7th step, add operation result latch output, if rst=1 and clk rise effectively, { Y [15:12]=YIN [15:14], and Y [11]=YIN [13 }, Y [10:0]=YIN [10:0] }, export Y [15:0]=AX [15:0]+B [15:0].
The input signal of address generator (referring to Figure 12, Figure 13) is clock nmycle [1], reset signal rst, 16 digits According to data_abs [15:0];Output signal is 17 bit address addr [16:0].Its function are as follows: for generating the address of look-up table, The section of the value according to locating for data generates 17 bit address values;16 input data data_abs [15:0] simultaneously with 4 bit address Generation module addrgen1, addrgen2, addrgen3, addrgen4 are compared;As 6.5≤data_abs of input data When [15:0]≤8, the output addr [3:0] of addrgen1 wherein one be 0, it is 1 that other addresses, which generate the output of mould address,;When defeated When entering 4.5≤data_abs of data [15:0]≤6, the output adr [3:0] of addrgen2 wherein one be 0, other addresses generate The output of mould address is 1;As 2.5≤data_abs of input data [15:0]≤4, the output addr [3:0] of addrgen3 is wherein One is 0, and it is 1 that other addresses, which generate the output of mould address,;As 0.5≤data_abs of input data [15:0]≤2, addrgen4 Output adr [3:0] wherein one be 0, it is 1 that other addresses, which generate the output of mould address,.The structure of address generator such as Figure 12 institute Show: address generator includes four 4 bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen4;
The end cin of addrgen1 is grounded, the end cin of the end the cout connection addrgen2 of addrgen1, addrgen2's The end cout connects the end cin of addrgen3, the end cin of the end the cout connection addrgen4 of addrgen3;addrgen1, Addrgen2, addrgen3 connect nmcycle [1] with the end clk of addrgen4;Addrgen1, addrgen2, addrgen3 and The end rst of addrgen4 connects externally input reset signal rst, addrgen1, addrgen2, addrgen3 and addrgen4 D [15:0] end bound symbol judgment module output data;The d3 [15:0] of addrgen1 terminates constant (8)10It is corresponding floating Points, d2 [15:0] termination constant (7.5)10Corresponding floating number, d1 [15:0] termination constant (7)10Corresponding floating number, d0 [15:0] terminates constant (6.5)10Corresponding floating number, output end out [3:0] the output address addr [16:13] of addrgen1;
The d3 [15:0] of Addrgen2 terminates constant (6)10Corresponding floating number, d2 [15:0] termination constant (5.5)10It is right Floating number, d1 [15:0] termination constant (5) answered10Corresponding floating number, d0 [15:0] termination constant (4.5)10Corresponding floating-point Number, output end out [3:0] the output address addr [12:9] of addrgen1;
The d3 [15:0] of Addrgen3 terminates constant (4)10Corresponding floating number, d2 [15:0] termination constant (3.5)10It is right Floating number, d1 [15:0] termination constant (3) answered10Corresponding floating number, d0 [15:0] termination constant (2.5)10Corresponding floating-point Number, output end out [3:0] the output address addr [8:5] of addrgen1;
The d3 [15:0] of Addrgen4 terminates constant (2)10Corresponding floating number, d2 [15:0] termination constant (1.5)10It is right Floating number, d1 [15:0] termination constant (1) answered10Corresponding floating number, d0 [15:0] termination constant (0.5)10Corresponding floating-point The end cout of number, output end out [3:0] the output address addr [4:1] of addrgen1, addrgen4 exports addr [0].
The input signal of 4 bit address generation modules (referring to Figure 13) is clock signal nmcycle [1], 16 data data_ Abs [15:0], reset signal rst, 16 data d0,16 data d1,16 data d2,16 data d3, carry digit CIN; Output signal is 4 bit address addr [3:0], carry-out cout.Its function are as follows: 16 input data data_abs [15:0] are same Division numerical constant d0, d1, d2, d3 (and d0 < d1 < d2 < d3) of Shi Yusi tanh function argument value interval are compared; As input data data_abs [15:0] >=d3, addr [3] is 0, and the output of other addresses is 1, and cascaded-output control cout is 1;As input data d2≤data_abs [15:0] < d3, addr [2] is 0, other output address are 1, cascaded-output control Cout is 1;As input data d1≤data_abs [15:0] < d2, addr [1] is 0, and the output of other addresses is 1, cascaded-output Controlling cout is 1;As input data d0≤data_abs [15:0] < d1, addr [0] is 0, and the output of other addresses is 1, cascade Output control cout is 1;When cascade input control signal CIN is effective, address addr [3:0] is 1, and cascaded-output controls cout It is 1.4 bit address generation module structures are as shown in figure 13: 4 bit address generation module addrgen1, addrgen2, addrgen3 and Addrgen4 structure is identical, includes floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floats Point comparator fcom_7, phase inverter inv_6, phase inverter inv_7, phase inverter inv_8, phase inverter inv_9, two inputs or door Or21_3, three inputs or door or31_2, four inputs or door or41_0, five inputs or door or51_0, five inputs or door or51_1, D Trigger dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19;
Floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 are equal Have two input terminals;
The first input end of floating-point comparator fcom_4 is held as the d [15:0] of 4 bit address generation modules, the second input terminal D3 [15:0] as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_5 is connect with the first input end of floating-point comparator fcom_4, and second is defeated The d2 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_6 is connect with the first input end of floating-point comparator fcom_4, and second is defeated The d1 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_7 is connect with the first input end of floating-point comparator fcom_4, and second is defeated The d0 [15:0] for entering end as 4 bit address generation modules is held;
The first input end of two inputs or door or21_3, the first input end of three inputs or door or31_2, four inputs or door The first input end of or41_0, the first input end of five inputs or door or51_0 and five inputs or the first input of door or51_1 Hold the end cin after being connected as 4 bit address generation modules;
The output end of floating-point comparator fcom_4 inputs simultaneously with the input terminal of phase inverter inv_6 and two or door or21_3 The connection of the second input terminal, two inputs or the output end of door or21_3 and the input terminal d of d type flip flop dff_16 are connect;
The output end of floating-point comparator fcom_5 inputs simultaneously with the input terminal of phase inverter inv_7 and three or door or31_2 The connection of third input terminal, the output end of phase inverter inv_6 connect with three inputs or the second input terminal of door or31_2, and three input Or the output end of door or31_2 is connect with the input terminal d of d type flip flop dff_17;
The output end of floating-point comparator fcom_6 inputs simultaneously with the input terminal of phase inverter inv_8 and four or door or41_0 The connection of the 4th input terminal, the output end of phase inverter inv_6 is inputted with four or the second input terminal of door or41_0 is connect, phase inverter The output end of inv_7 is connect with the third input terminal of four inputs or door or41_0, and the output end and D of four inputs or door or41_0 touch Send out the input terminal d connection of device dff_18;
The output end of floating-point comparator fcom_7 inputs simultaneously with the input terminal of phase inverter inv_9 and five or door or51_0 The connection of the 5th input terminal, the output end of phase inverter inv_6 is inputted with five or the second input terminal of door or51_0 is connect, phase inverter The output end of inv_7 connect with five inputs or the third input terminal of door or51_0, the output end of phase inverter inv_8 and five input or The input terminal d of the 4th input terminal connection of door or51_0, the output end and d type flip flop dff_19 of five inputs or door or51_0 connects It connects;
The output end of phase inverter inv_6 is connect with the second input terminal of five inputs or door or51_1, and phase inverter inv_7's is defeated Outlet connect with five inputs or the third input terminal of door or51_1, and the output end of phase inverter inv_8 and five inputs or door or51_1 The connection of the 4th input terminal, the output end of phase inverter inv_9 connect with five inputs or the 5th input terminal of door or51_1, and five input Or cout end of the output end of door or51_1 as 4 bit address generation modules;
The RESET input of d type flip flop dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19 The end rst as 4 bit address generation modules after rst is connected, input end of clock clk as 4 bit address generate mould after being connected The end clk of block, d type flip flop dff_16 export the 3rd bit address, and d type flip flop dff_17 exports the 2nd bit address, d type flip flop dff_18 The 1st bit address is exported, d type flip flop dff_19 exports the 0th bit address.
The input signal of look-up table (referring to Figure 14) is reset signal rst, address date addr [16:0], nmcycle [2], symbol neg;Output signal is lut_data [15:0].Its function are as follows: when reset signal rst is effective, look-up table sets number, The value for searching list item is set to functional value corresponding to tanh function argument section;Neg and addr are collectively formed in look-up table The index address of list item;When index address is effective, the storage value lut_data [15:0] of corresponding look-up table list item is exported.It searches Table structure is as shown in figure 14: look-up table is mainly made of 33 lookup list items, and adhesive logic is by 33 two inputs two or door and one A 16 d type flip flops composition;33 lookup list item mem_i (i=32 ... 0) from address high to address bottom, successively set by position storage value Are as follows:
(1.00000)10, (1.00000)10, (0.99999)10, (0.99999)10, (0.99998)10, (0.99996)10, (0.99909)10, (0.99975)10, (0.99932)10, (0.99817)10,(0.99505)10, (0.98661)10, (0.96402)10,(0.90514)10, (0.76159)10,(0.46211)10,(0)10, (- 0.50000)10, (- 0.10000)10, (- 0.15000)10, (- 0.20000)10, (- 0.25000)10, (- 0.30000)10, (- 0.35000)10, (- 0.40000)10, (- 0.45000)10, (- 0.50000)10, (- 0.55000)10, (- 0.60000)10, (- 0.65000)10, (- 0.70000)10, (- 0.75000)10, (- 0.80000)10;The output end of 33 two inputs two or door meets 33 lookup list item mem_i (i=respectively 32 ... 0) tri-state control end, neg and addr [15:0] connect two input terminals of 17 two inputs or door respectively;Neg and!addr [15:0] connects two inputs termination of 16 two or door respectively.33 storage unit outputs are connected together, but can only export every time One value, the data terminal d, nmcycle [2] that output valve meets trigger dff_20 meet the clock end clk of trigger dff_20, trigger The output end q of device dff_20 is lut_data [15:0].
The input signal of opposite arithmetic unit (referring to Fig. 2, Fig. 5, Figure 15) be floating-point adder output data Y [15:0], Cycle control signal mcycle [5], reset signal rst;Output signal is NY [15:0].Its function are as follows: calculate input data The complement code of opposite number.The structure of opposite arithmetic unit is as shown in figure 14, and opposite arithmetic unit includes data opposite number complement arithmetic device DD2 With d type flip flop dff_21,
Data opposite number complement arithmetic device DD2 input terminal DIN connection floating-point adder output as a result, output end DOUT Meet the d of the data input pin of d type flip flop dff_21;The input end of clock clk connection mcycle [5] of d type flip flop dff_21, D touching The RESET input rst of hair device dff_21 is connect with external input reset signal rst, the output end q output of d type flip flop dff_21 The opposite number of floating-point adder output result.Wherein the operation process of data opposite number complement arithmetic device DD2 refers to Fig. 5.
The present invention can realize the most widely used two kinds of activation primitive sigmoid of field of neural networks by configuring control word The operation of function and tanh function realizes that structure is simple, designs using synchronised clock that, convenient for timing inspection and verifying, area is small, It is low in energy consumption, convenient for realizing on chip, enhance the practicability of Embedded Application;Neural network, which is calculated, using the present invention activates letter When number, process flow is simple, easily controllable, improves the efficiency of neural network activation primitive calculating;The configurable nerve net The realization device that network swashs function can be according to tanh computational accuracy needs, easily extended address generator block and look-up table means To meet the needs of function precision transformation.Therefore it is the ideal structure that embedded neural network processor activation primitive is realized.
Unspecified part of the present invention belongs to common sense well known to those skilled in the art.

Claims (14)

1. a kind of configurable neural network activation primitive realization device, which is characterized in that including controller, symbol decision mould Block, range detection module, parameter register, floating-point multiplier, floating-point adder, opposite number arithmetic unit, address generator, lookup Table, the first gated latches and the second gated latches;
Controller: the latch needed in the different entire operational data paths of activation primitive is generated according to the value of configuration control signal M It controls signal and signal done is completed in operation;
Symbol decision module: receiving the operational data of input, judges the positive and negative of the data, if it is canonical by the data export to Range detection module, address generator and floating-point multiplier;Otherwise by the absolute value of the data export to range detection module, Location generator and floating-point multiplier, while the symbol of the data being exported to address generator and the first gated latches;
Range detection module: judging which section received data value is in, and identifies and believes to parameter register range of transmission section Number;
Parameter register: storage is used to approach the parameter of the linear function of sigmoid activation primitive, i.e. Monomial coefficient and offset Amount;Approach the linear function of sigmoid activation primitive according to the selection of range intervals id signal, to floating-point multiplier export this one The Monomial coefficient of secondary function exports the offset of the linear function to floating-point adder;
Floating-point multiplier: the Monomial coefficient for the linear function that extraction scope detecting module exports from parameter register module, And the Monomial coefficient of the linear function and the product of data value are calculated, it exports to floating-point adder;
Floating-point adder: the offset for the linear function that extraction scope detecting module exports from parameter register module calculates The sum of product and offset from floating-point multiplier, and obtained result is exported and is latched to opposite arithmetic unit and the first gating Device;
Opposite arithmetic unit: the opposite number of floating-point adder output result is calculated, is exported to the first gated latches;
First gated latches: when the symbol of input operational data is timing, by the calculated result gating from floating-point adder It exports to the second gated latches, when the symbol for inputting operational data is negative, will be selected from the calculated result of opposite arithmetic unit Logical output is to the second gated latches;
Address generator: section locating for the data value exported according to symbol decision module generates 17 bit address values, as lookup Table index;
Look-up table: storing tanh activation primitive value corresponding to each data interval, according to the look-up table rope in address generator Draw the corresponding tanh activation primitive value of lookup input operational data, exports to the second gated latches;
Second gated latches: according to the value of configuration control signal M, gating exports and latches the operation of sigmoid activation primitive Or the operation result of tanh activation primitive as a result.
2. a kind of configurable neural network activation primitive realization device according to claim 1, which is characterized in that M=1 Indicate the operation of sigmoid activation primitive, controller generates 8 needed in the entire operational data path of sigmoid activation primitive Latch control signal mcycle [7:1];M=0 indicates the operation of tanh activation primitive, and controller generates tanh activation primitive and entirely transports Calculate the 4 latch control signal nmcycle [3:1] needed in data path.
3. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described Controller includes d type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop dff_4, D Trigger dff_5, d type flip flop dff_6, d type flip flop dff_7, d type flip flop dff_8, with door and21_0, with door and21_1, with Door and21_2, with door and21_3, with door and21_4, with door and21_5, with door and21_6, with door and21_7 and door And21_8, nor gate nor21_0, reverser are inputted with door and21_9, with door and21_10 or door or21_0, or21_1, two Inv_0 and reverser inv_1;
D type flip flop dff_0, d type flip flop dff_1, d type flip flop dff_2, d type flip flop dff_3, d type flip flop dff_4, d type flip flop Dff_5, d type flip flop dff_6, d type flip flop dff_7 connect external input clock with the input end of clock clk of d type flip flop dff_8 The data input pin d connection " 1 " of signal clk, d type flip flop dff_0 and high potential, the output end q of d type flip flop dff_0 is on the one hand Cycle control signal cycle0 is exported, the data input pin d of trigger dff_1 is on the other hand connected;D type flip flop dff_1's is defeated The data input pin d of outlet q connection d type flip flop dff_2;The number of the output end q connection d type flip flop dff_3 of d type flip flop dff_2 According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_4 of d type flip flop dff_3;D type flip flop dff_4's is defeated The data input pin d of outlet q connection d type flip flop dff_5;The number of the output end q connection d type flip flop dff_6 of d type flip flop dff_5 According to input terminal d;The data input pin d of the output end q connection d type flip flop dff_7 of d type flip flop dff_6;D type flip flop dff_7's is defeated Outlet q connection d type flip flop dff_8 data input pin d;
The input terminal of reverser inv_0 connects configuration control signal M;The output end of reverser inv_0 connects simultaneously and door The first input end of and21_0, the first input end with the first input end of door and21_2 and with door and21_4, d type flip flop The output end q of dff_1 connects the first input end with the second input terminal of door and21_0 and with door and21_1, configuration control simultaneously Signal M processed is connected simultaneously with the second input terminal of the second input terminal of door and21_1 and door and21_3, with door and21_5's Second input terminal, second with the second input terminal of the second input terminal of door and21_6 and door and21_7 and door and21_8 Input terminal, the second input terminal with the second input terminal of door and21_9 and with door and21_10, the output of d type flip flop dff_2 Q is held to connect the first input end with the second input terminal of door and21_2 and with door and21_3 simultaneously, d type flip flop dff_3's is defeated Outlet q connects the first input end with the second input terminal of door and21_4 and with door and21_5 simultaneously, d type flip flop dff_4's Output end q connects the first input end with door and21_6 simultaneously, and the output end q of d type flip flop dff_5 is connected simultaneously and door The first input end of and21_7, the output end q of d type flip flop dff_6 connect the first input end with door and21_8, D touching simultaneously The output end q for sending out device dff_7 is connected simultaneously and the first input end of door and21_9, the output end q of d type flip flop dff_8 connect simultaneously Connect the first input end with door and21_10;
With the output end output signal nmcycle [1] of door and21_0, output end output signal mcycle with door and21_1 [1], with the output end output signal nmcycle [2] of door and21_2, output end output signal mcycle with door and21_3 [2], with the output end output signal nmcycle [3] of door and21_4, output end output signal mcycle with door and21_5 [3], with the output end output signal mcycle [4] of door and21_6, output end output signal mcycle with door and21_7 [5], with the output end output signal mcycle [6] of door and21_8, output end output signal mcycle with door and21_9 [7];
With the output end of door and21_6 with or the first input end of door or21_1 connect, with the output end of door and21_10 with or On the one hand the second input terminal connection of door or21_1 or the output end of door or21_1 export operation and complete signal done, another party Face is connect with the first input end of two input nor gate nor21_0, and the input terminal of reverser inv_1 connects externally input reset Signal rst, the output end of reverser inv_1 are connect with the second input terminal of two input nor gate nor21_0, two input nor gates The output end of nor21_0 connects d type flip flop dff_0, dff_1, dff_2, dff_3, dff_4, dff_5, dff_6, dff_7 simultaneously And the RESET input rst of dff_8;With the output end of door and21_5 with or the first input end of door or21_0 connect, with door The output end of and21_9 with or the second input terminal of door or21_0 connect or the output end output signal result_ of door or21_0 clk。
4. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described Symbol decision module includes data opposite number complement arithmetic device DD1,16 data latches and d type flip flop dff_9;
The input terminal d2 of the input terminal DIN of data opposite number complement arithmetic device DD1 and 16 data latches simultaneously with input The input terminal d1 of operational data connection, the output end DOUT of data opposite number complement arithmetic device DD1 and 16 data latches connects It connects;
The 11st data [11] for inputting operational data connects control signal ctrl and the D triggering of 16 data latches simultaneously The data input pin d of device dff_9;
Cycle control signal cycle0 simultaneously connect 16 data latches input end of clock clk and d type flip flop dff_9 when Clock input terminal clk;Externally input reset signal rst connects the RESET input rst and the D triggering of 16 data latches simultaneously The RESET input rst of device dff_9;
The absolute value data_abs [15:0] of the output end d3 output input operational data of 16 data latches, d type flip flop The symbol neg of the output end q output input operational data of dff_9.
5. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described Opposite arithmetic unit includes data opposite number complement arithmetic device DD2 and d type flip flop dff_21,
Data opposite number complement arithmetic device DD2 input terminal DIN connection floating-point adder output as a result, output end DOUT meets D The d of the data input pin of trigger dff_21;
The input end of clock clk connection mcycle [5] of d type flip flop dff_21, the RESET input rst of d type flip flop dff_21 with The rst connection of external input reset signal, the opposite number of the output end q output floating-point adder output result of d type flip flop dff_21.
6. a kind of configurable neural network activation primitive realization device according to claim 4 or 5, which is characterized in that Data opposite number complement arithmetic device DD1 is identical with data opposite number complement arithmetic device DD2 implementation process, specific as follows:
Step 1, the note received data of input terminal DIN are DIN [15:0], calculate the opposite number complement code D_MID of DIN [15:0] mantissa [13:0], while step 2, step 3 and step 4 are carried out, D_MID [13:0] is judged;
Step 2 judges whether D_MID [13:0] is 0, if so, into step 5;
Step 3, judges whether D_MID [13:0] overflows, if D_MID [13] xorD_MID [12]=1 is to overflow, if overflowing Out, then enter step 6;
Step 4 enters step 7 if D_MID [13] xorD_MID [12]=0;
Step 5, setting D_MID [17:14] is maximum negative value, to obtain 0 Correct method, while carrying out step 8, step 9 And step 10, D_MID [17:14] is judged;
Step 6, D_MID [13:0] moves to right one, and index D _ MID [17:14]=DIN [15:12] plus 1, while carrying out the 8th Step, step 9 and step 10, judge D_MID [17:14];
Step 7, D_MID [13:0] moves to left K, and index D _ MID [17:14]=DIN [15:12] subtracts K, at the same carry out step 8, Step 9 and step 10 judge D_MID [17:14];
Step 8, whether index of discrimination operation result D_MID [17:14] is positive to overflow, if it is, into step 11;
Step 9, whether negative sense overflows index of discrimination operation result D_MID [17:14], if it is, into step 12;
Step 10, whether index of discrimination operation result D_MID [17:14] is in range, if it is, into step 13;
Step 11, if D_MID [13:0] > 0, setting D_MID is floating number positive peak;If D_MID [13:0] < 0, sets D_ MID is floating number negative peak, into step 13;
Step 12, it is -8 that setting D_MID [13:0], which is 0, D_MID [17:14], into step 13;
18 bit arithmetic data D_MID are removed sign-extension bit by step 13, remove implicit position to get 16 exported to DOUT sections Position floating-point format data DOUT [15:0], wherein DOUT [15:12]=D_MID [17:14], DOUT [11]=D_MID [12], DOUT [10:0]=D_MID [10:0].
7. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that range Detecting module includes floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, phase inverter inv_3, anti- Phase device inv_4, phase inverter inv_5, two inputs or door or21_2, three inputs or door or31_0, three inputs or door or31_1, D touching Send out device dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14;
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_1, second Input terminal connects constant C0, and C0 is (5.0000)10
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_2, second Input terminal connects constant C1, and C1 is (1.0000)10
The data data_abs [15:0] of the first input end bound symbol judgment module output of floating-point comparator fcom_3, second Input terminal connects constant C2, and C2 is (1.0000)10
The output end of floating-point comparator fcom_1 simultaneously with the data input pin d and phase inverter inv_3 of d type flip flop dff_11 Input terminal connection, the output end of phase inverter inv_3 inputs simultaneously with the first input end of two inputs or door or21_2, three or door The first input end of or31_0 and three inputs or the first input end connection of door or31_1, the output of floating-point comparator fcom_2 End is connect with the second input terminal of the input terminal of phase inverter inv_4 and two inputs or door or21_2 simultaneously;Two inputs or door The output end of or21_2 is connect with the data input pin d of d type flip flop dff_12;The output end of phase inverter inv_4 is defeated with three simultaneously Enter or the second input terminal of door or31_0 and three input or door or31_1 the second input terminal connection;Floating-point comparator fcom_3 Output end connect simultaneously with three inputs or the third input terminal of door or31_0 and the input terminal of phase inverter inv_5, phase inverter The output end of inv_5 is connect with the third input terminal of three inputs or door or31_1, and the output end and D of three inputs or door or31_0 touch The data input pin d connection of device dff_13 is sent out, the data of the output end and d type flip flop dff_14 of three inputs or door or31_1 input Hold d connection;
D type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop dff_14 the RESET input rst with Externally input reset signal rst connection, d type flip flop dff_11, d type flip flop dff_12, d type flip flop dff_13 and d type flip flop The end input end of clock clk of dff_14 connects mcycle [1], d type flip flop dff_14 output end q output signal range [0], D touching It sends out device dff_13 output end q output signal range [1], d type flip flop dff_12 output end q output signal range [2], D triggering Device dff_11 output end q output signal range [3].
8. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that described Address generator includes four 4 bit address generation modules addrgen1, addrgen2, addrgen3 and addrgen4;
The end cin of addrgen1 is grounded, the end cin of the end the cout connection addrgen2 of addrgen1, the end cout of addrgen2 Connect the end cin of addrgen3, the end cin of the end the cout connection addrgen4 of addrgen3;addrgen1,addrgen2, Addrgen3 connects nmcycle [1] with the end clk of addrgen4;Addrgen1, addrgen2, addrgen3 and addrgen4 The end rst connect externally input reset signal rst, the d of addrgen1, addrgen2, addrgen3 and addrgen4 [15: 0] data of end bound symbol judgment module output;The d3 [15:0] of addrgen1 terminates constant (8)10Corresponding floating number, d2 [15:0] terminates constant (7.5)10Corresponding floating number, d1 [15:0] termination constant (7)10Corresponding floating number, the end d0 [15:0] Connect constant (6.5)10Corresponding floating number, output end out [3:0] the output address addr [16:13] of addrgen1;
The d3 [15:0] of Addrgen2 terminates constant (6)10Corresponding floating number, d2 [15:0] termination constant (5.5)10It is corresponding floating Points, d1 [15:0] termination constant (5)10Corresponding floating number, d0 [15:0] termination constant (4.5)10Corresponding floating number, Output end out [3:0] the output address addr [12:9] of addrgen1;
The d3 [15:0] of Addrgen3 terminates constant (4)10Corresponding floating number, d2 [15:0] termination constant (3.5)10It is corresponding floating Points, d1 [15:0] termination constant (3)10Corresponding floating number, d0 [15:0] termination constant (2.5)10Corresponding floating number, Output end out [3:0] the output address addr [8:5] of addrgen1;
The d3 [15:0] of Addrgen4 terminates constant (2)10Corresponding floating number, d2 [15:0] termination constant (1.5)10It is corresponding floating Points, d1 [15:0] termination constant (1)10Corresponding floating number, d0 [15:0] termination constant (0.5)10Corresponding floating number, The end cout of output end out [3:0] the output address addr [4:1] of addrgen1, addrgen4 export addr [0].
9. a kind of configurable neural network activation primitive realization device according to claim 8, which is characterized in that 4 Address generating module addrgen1, addrgen2, addrgen3 are identical with addrgen4 structure, include floating-point comparator Fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7, phase inverter inv_6, phase inverter Inv_7, phase inverter inv_8, phase inverter inv_9, two inputs or door or21_3, three inputs or door or31_2, four inputs or door Or41_0, five inputs or door or51_0, five inputs or door or51_1, d type flip flop dff_16, d type flip flop dff_17, d type flip flop Dff_18 and d type flip flop dff_19;
Floating-point comparator fcom_4, floating-point comparator fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 are provided with Two input terminals;
The first input end of floating-point comparator fcom_4 is that the d [15:0] of 4 bit address generation modules is held, and the second input terminal is as 4 The d3 [15:0] of bit address generation module is held;
The first input end of floating-point comparator fcom_5 is connect with the first input end of floating-point comparator fcom_4, the second input terminal D2 [15:0] as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_6 is connect with the first input end of floating-point comparator fcom_4, the second input terminal D1 [15:0] as 4 bit address generation modules is held;
The first input end of floating-point comparator fcom_7 is connect with the first input end of floating-point comparator fcom_4, the second input terminal D0 [15:0] as 4 bit address generation modules is held;
The first input end of two inputs or door or21_3, the first input end of three inputs or door or31_2, four inputs or door or41_ The first input end phase of 0 first input end, the first input end of five inputs or door or51_0 and five inputs or door or51_1 The end cin after connection as 4 bit address generation modules;
The output end of floating-point comparator fcom_4 simultaneously with the input terminal of phase inverter inv_6 and two inputs or door or21_3 the The connection of two input terminals, two inputs or the output end of door or21_3 and the input terminal d of d type flip flop dff_16 are connect;
The output end of floating-point comparator fcom_5 simultaneously with the input terminal of phase inverter inv_7 and three inputs or door or31_2 the The connection of three input terminals, the output end of phase inverter inv_6 are connect with the second input terminal of three inputs or door or31_2, three inputs or door The output end of or31_2 is connect with the input terminal d of d type flip flop dff_17;
The output end of floating-point comparator fcom_6 simultaneously with the input terminal of phase inverter inv_8 and four inputs or door or41_0 the The connection of four input terminals, the output end of phase inverter inv_6 are connect with the second input terminal of four inputs or door or41_0, phase inverter inv_ 7 output end is connect with the third input terminal of four inputs or door or41_0, the output end and d type flip flop of four inputs or door or41_0 The input terminal d connection of dff_18;
The output end of floating-point comparator fcom_7 simultaneously with the input terminal of phase inverter inv_9 and five inputs or door or51_0 the The connection of five input terminals, the output end of phase inverter inv_6 are connect with the second input terminal of five inputs or door or51_0, phase inverter inv_ 7 output end connect with five inputs or the third input terminal of door or51_0, and the output end of phase inverter inv_8 and five inputs or door The 4th input terminal of or51_0 connects, and five inputs or the output end of door or51_0 and the input terminal d of d type flip flop dff_19 are connect;
The output end of phase inverter inv_6 is connect with the second input terminal of five inputs or door or51_1, the output end of phase inverter inv_7 Connect with five inputs or the third input terminal of door or51_1, the output end of phase inverter inv_8 and five input or door or51_1 the The connection of four input terminals, the output end of phase inverter inv_9 are connect with the 5th input terminal of five inputs or door or51_1, five inputs or door Cout end of the output end of or51_1 as 4 bit address generation modules;
The RESET input rst phase of d type flip flop dff_16, d type flip flop dff_17, d type flip flop dff_18 and d type flip flop dff_19 The end rst after connection as 4 bit address generation modules, input end of clock clk be connected after as 4 bit address generation modules The end clk, d type flip flop dff_16 export the 3rd bit address, and d type flip flop dff_17 exports the 2nd bit address, d type flip flop dff_18 output 1st bit address, d type flip flop dff_19 export the 0th bit address.
10. a kind of configurable neural network activation primitive realization device according to claim 7 or 9, which is characterized in that Floating-point comparator fcom_1, floating-point comparator fcom_2, floating-point comparator fcom_3, floating-point comparator fcom_4, floating-point compare Device fcom_5, floating-point comparator fcom_6, floating-point comparator fcom_7 workflow are identical, each floating-point comparator workflow It is specific as follows:
Step 1 compares first input end input data and the second input terminal constant:
It according to floating point data format, adds mantissa and implies position, while extending a bit sign position;
As a result the mantissa that the second input terminal constant is subtracted with the mantissa of first input end input data is denoted as dm [13:0];
Step 2 compares the index of first input end input data and the second input terminal constant:
A bit sign position is extended, the index of the second input terminal constant is subtracted with the index of first input end input data, is as a result remembered For de [4:0];
Step 3 carries out symbol decision to mantissa comparison result dm [13:0]:
Sdm:sdm=dm [13] xor dm [12] is calculated using following formula;
When sdm is 0, dm [13:0] is positive number;When sdm is 1, dm [13:0] is negative;
Step 4 carries out symbol decision to index comparison result de:
Sde:sde=de [4] xor dm [3] is calculated using following formula;
When sde is 0, de [4:0] is positive number;When sde is 1, de [4:0] is negative;
Step 5, comparison result judgement, if it is not 0 that sde, which is 0 and de [4:0], it is defeated that first input end input data is greater than second Enter to hold constant, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 0, first input end input data is greater than the second input Constant is held, exporting is 0;If sde is 0, de [4:0] is that 0 and sdm is 1, first input end input data is less than the second input terminal Constant, exporting is 1;If sde is 1, for first input end input data less than the second input terminal constant, exporting is 1.
11. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute Stating parameter register includes four 32 tri-state control doors and 32 d type flip flop dff_15, j-th of 32 tri-state control doors Control signal end join domain detecting module output range intervals id signal jth position, j=0,1,2,3;Four 32 The reset signal end of tri-state control door is all connected with externally input reset signal rst, the output end of four 32 tri-state control doors With the data input pin d of 32 d type flip flop dff_15, the RESET input rst connection of 32 d type flip flop dff_15 is external defeated The reset signal rst entered, the input end of clock outer mcycle of clk connection [2] of 32 d type flip flop dff_15,32 d type flip flops High 16 of the output end q output of dff_15 are Monomial coefficient A [15:0], and low 16 are offset B [15:0].
12. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute Stating look-up table includes 33 16 tri-state control doors, 33 two inputs or door, 16 phase inverters and 1 16 d type flip flop dff_ 20;
It is preceding 17 two input or door the second input terminal with symbol decision module output symbol connect, it is preceding 17 two input or Men Zhong, kth1The kth bit address value that the first input end link address generator of a two input or door generates, k1=0,1, 2 ... ... 16;
16 two inputs or door are corresponded with 16 phase inverters afterwards, latter 16 two input or the second input terminal of door with it is corresponding Inverter output connection, the input terminal of 16 phase inverters with symbol decision module output symbol connect, latter 16 two In input or door, kth2The kth that the first input end link address generator of a two input or door generates2- 16 bit address values, k2 =17,18,19 ... ... 32;
The output end of each two input or door is connected with the second input terminal of corresponding 16 tri-state control doors, and 33 16 three State control door first input end and 16 d type flip flop dff_20 the RESET input rst with external input reset signal Rst connection, data input pin with 16 d type flip flop dff_20 after the output end of 33 16 tri-state control doors links together D connection, the input end of clock clk and nmcycle [2] of 16 d type flip flop dff_20 are connect, and 16 d type flip flop dff_20's is defeated The corresponding tanh activation primitive value lut_data [15:0] of outlet q output input operational data.
13. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute Stating the first gated latches includes the identical gating latch units of 16 structures;
P-th of gating latch units includes selector mux_1, reverser inv_2 and d type flip flop dff_22, selector mux_1's A input terminal connects the pth position of floating-point adder calculated result, and the B input terminal of selector mux_1 connects opposite arithmetic unit and calculates knot The pth position of fruit, the output end of selector mux_1 are connect with the d input terminal of d type flip flop dff_22, the input terminal of reverser inv_2 And the end AS of selector mux_1 is connect with the data symbol of symbol decision module output simultaneously, the output end of reverser inv_2 It is connect with the end BS of selector mux_1;The input end of clock clk and mcycle [6] of d type flip flop dff_22 is connect, d type flip flop The RESET input rst of dff_22 is connect with externally input reset signal, the first choosing of output end output of d type flip flop dff_22 The pth position output data of logical latch, p=0,1,2,3 ... ... 15.
14. a kind of configurable neural network activation primitive realization device according to claim 2, which is characterized in that institute Stating the second gated latches includes the identical gating latch units of 16 structures;
Q-th of gating latch units includes selector mux_2, reverser inv_2 ' and d type flip flop dff_23, selector mux_2 A input terminal connection and locating table output result pth position, the B input terminal of selector mux_2 connects the output of the first gated latches As a result pth position, the output end of selector mux_2 are connect with the d input terminal of d type flip flop dff_23, and reverser inv_2's ' is defeated Enter the end AS at end and selector mux_2 while being connect with configuration control signal M, the output end and selector of reverser inv_2 ' The end BS of mux_2 connects;The input end of clock clk of d type flip flop dff_23 and the cycle control signal of controller connect, D triggering The RESET input rst of device dff_23 is connect with externally input reset signal, the output end output second of d type flip flop dff_23 The pth position output data of gated latches.
CN201910041332.7A 2019-01-16 2019-01-16 Configurable neural network activation function implementation device Active CN109816105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910041332.7A CN109816105B (en) 2019-01-16 2019-01-16 Configurable neural network activation function implementation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910041332.7A CN109816105B (en) 2019-01-16 2019-01-16 Configurable neural network activation function implementation device

Publications (2)

Publication Number Publication Date
CN109816105A true CN109816105A (en) 2019-05-28
CN109816105B CN109816105B (en) 2021-02-23

Family

ID=66604394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910041332.7A Active CN109816105B (en) 2019-01-16 2019-01-16 Configurable neural network activation function implementation device

Country Status (1)

Country Link
CN (1) CN109816105B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414677A (en) * 2019-07-11 2019-11-05 东南大学 It is a kind of to deposit interior counting circuit suitable for connect binaryzation neural network entirely
CN110610235A (en) * 2019-08-22 2019-12-24 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN111047007A (en) * 2019-11-06 2020-04-21 北京中科胜芯科技有限公司 Activation function calculation unit for quantized LSTM
CN112256094A (en) * 2020-11-13 2021-01-22 广东博通科技服务有限公司 Deep learning-based activation function device and use method thereof
CN112734023A (en) * 2021-02-02 2021-04-30 中国科学院半导体研究所 Reconfigurable circuit applied to activation function of recurrent neural network
TWI755043B (en) * 2019-09-04 2022-02-11 美商聖巴諾瓦系統公司 Sigmoid function in hardware and a reconfigurable data processor including same

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020066046A1 (en) * 2000-10-24 2002-05-30 Chin-Shuing Liu Apparatus for directly connecting to the internet and method thereof
TW201331855A (en) * 2012-01-19 2013-08-01 Univ Nat Taipei Technology High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes
CN105987775A (en) * 2016-07-20 2016-10-05 天津理工大学中环信息学院 Temperature sensor nonlinearity correction method and system based on BP neural network
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
CN107003989A (en) * 2014-12-19 2017-08-01 英特尔公司 For the distribution and the method and apparatus of Collaboration computing in artificial neural network
CN107844439A (en) * 2016-09-20 2018-03-27 三星电子株式会社 Support the storage device and system and its operating method of command line training
EP3343463A1 (en) * 2016-12-31 2018-07-04 VIA Alliance Semiconductor Co., Ltd. Neural network unit with re-shapeable memory
CN108564169A (en) * 2017-04-11 2018-09-21 上海兆芯集成电路有限公司 Hardware processing element, neural network unit and computer usable medium
KR20180120009A (en) * 2017-04-26 2018-11-05 광주과학기술원 A stochastic implementation method of an activation function for an artificial neural network and a system including the same
CN108781265A (en) * 2016-03-30 2018-11-09 株式会社尼康 Feature extraction element, Feature Extraction System and decision maker
CN108885600A (en) * 2016-03-16 2018-11-23 美光科技公司 Use the device and method operated through compression and decompressed data

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020066046A1 (en) * 2000-10-24 2002-05-30 Chin-Shuing Liu Apparatus for directly connecting to the internet and method thereof
TW201331855A (en) * 2012-01-19 2013-08-01 Univ Nat Taipei Technology High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes
CN107003989A (en) * 2014-12-19 2017-08-01 英特尔公司 For the distribution and the method and apparatus of Collaboration computing in artificial neural network
CN108885600A (en) * 2016-03-16 2018-11-23 美光科技公司 Use the device and method operated through compression and decompressed data
CN108781265A (en) * 2016-03-30 2018-11-09 株式会社尼康 Feature extraction element, Feature Extraction System and decision maker
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
CN105987775A (en) * 2016-07-20 2016-10-05 天津理工大学中环信息学院 Temperature sensor nonlinearity correction method and system based on BP neural network
CN107844439A (en) * 2016-09-20 2018-03-27 三星电子株式会社 Support the storage device and system and its operating method of command line training
EP3343463A1 (en) * 2016-12-31 2018-07-04 VIA Alliance Semiconductor Co., Ltd. Neural network unit with re-shapeable memory
CN108564169A (en) * 2017-04-11 2018-09-21 上海兆芯集成电路有限公司 Hardware processing element, neural network unit and computer usable medium
KR20180120009A (en) * 2017-04-26 2018-11-05 광주과학기술원 A stochastic implementation method of an activation function for an artificial neural network and a system including the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHE-WEI LIN,ET AL: "《A digital circuit design of hyperbolic tangent sigmoid function for neural networks》", 《2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS》 *
吴成均等: "《面向神经网络加速器的近似加法器的电路设计》", 《航空科学技术》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414677A (en) * 2019-07-11 2019-11-05 东南大学 It is a kind of to deposit interior counting circuit suitable for connect binaryzation neural network entirely
CN110610235A (en) * 2019-08-22 2019-12-24 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN110610235B (en) * 2019-08-22 2022-05-13 北京时代民芯科技有限公司 Neural network activation function calculation circuit
TWI755043B (en) * 2019-09-04 2022-02-11 美商聖巴諾瓦系統公司 Sigmoid function in hardware and a reconfigurable data processor including same
CN111047007A (en) * 2019-11-06 2020-04-21 北京中科胜芯科技有限公司 Activation function calculation unit for quantized LSTM
CN111047007B (en) * 2019-11-06 2021-07-30 北京中科胜芯科技有限公司 Activation function calculation unit for quantized LSTM
CN112256094A (en) * 2020-11-13 2021-01-22 广东博通科技服务有限公司 Deep learning-based activation function device and use method thereof
CN112734023A (en) * 2021-02-02 2021-04-30 中国科学院半导体研究所 Reconfigurable circuit applied to activation function of recurrent neural network
CN112734023B (en) * 2021-02-02 2023-10-13 中国科学院半导体研究所 Reconfigurable circuit applied to activation function of cyclic neural network

Also Published As

Publication number Publication date
CN109816105B (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN109816105A (en) A kind of configurable neural network activation primitive realization device
Liu et al. A stochastic computational multi-layer perceptron with backward propagation
CN110058840A (en) A kind of low-consumption multiplier based on 4-Booth coding
CN108537332A (en) A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms
Qin et al. A novel approximation methodology and its efficient vlsi implementation for the sigmoid function
CN108921292A (en) Approximate calculation system towards the application of deep neural network accelerator
Wang et al. Constructing higher-dimensional digital chaotic systems via loop-state contraction algorithm
Scholl Multi-output functional decomposition with exploitation of don't cares
Zhang et al. Base-2 softmax function: Suitability for training and efficient hardware implementation
Ajit et al. FPGA based performance comparison of different basic adder topologies with parallel processing adder
Chen et al. A cordic-based architecture with adjustable precision and flexible scalability to implement sigmoid and tanh functions
Raghuram et al. Digital implementation of the softmax activation function and the inverse softmax function
Perri et al. Parallel architecture of power‐of‐two multipliers for FPGAs
Pan et al. A semi-tensor product based all solutions boolean satisfiability solver
Madenda et al. New Approach of Signed Binary Numbers Multiplication and Its Implementation on FPGA
Dakhole et al. Multi-digit quaternary adder on programmable device: Design & verification
Jalilvand et al. Fuzzy-logic using unary bit-stream processing
US20210383264A1 (en) Method and Architecture for Fuzzy-Logic Using Unary Processing
CN110458277A (en) The convolution hardware configuration of configurable precision suitable for deep learning hardware accelerator
Hacene et al. Efficient hardware implementation of incremental learning and inference on chip
Madoš et al. Field Programmable Gate Array hardware accelerator of prime implicants generation for single-output Boolean functions minimization
Métivier An algorithm for computing asynchronous automata in the case of acyclic non-commutation graphs
CN112949830B (en) Intelligent inference network system and addition unit and pooling unit circuitry
Gonzalez-Guerrero et al. Asynchronous Stochastic Computing
Chhabra et al. A Design Approach for Mac Unit Using Vedic Multiplier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant