CN107247992B - A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm - Google Patents
A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm Download PDFInfo
- Publication number
- CN107247992B CN107247992B CN201710416069.6A CN201710416069A CN107247992B CN 107247992 B CN107247992 B CN 107247992B CN 201710416069 A CN201710416069 A CN 201710416069A CN 107247992 B CN107247992 B CN 107247992B
- Authority
- CN
- China
- Prior art keywords
- fitting
- point
- polynomial
- coefficient
- section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of sigmoid Function Fitting hardware circuits based on column maze approximate algorithm, it is characterized in that following steps carry out: 1 determines the order of polynomial fitting;2 obtain the fit interval of sigmoid function;3 obtain piecewise interval;4 obtain polynomial fitting;5 design ratio memory modules;6 design multinomial operation modules;7 design judgment modules;8 obtain fitting hardware circuit;9 judge the fitting execution section where operand;10 read fitted polynomial coefficients;11 are fitted calculating in multinomial operation module.The present invention can improve operational precision, accelerate arithmetic speed and promote the flexibility of operating structure on the basis of reducing hardware resource consumption.
Description
The application is the applying date are as follows: on December 30th, 2014, application No. is 2014108504707, titles are as follows: Yi Zhongji
In the divisional application of the sigmoid Function Fitting hardware circuit of column maze approximate algorithm.
Technical field
The present invention relates to artificial neural network field, specifically a kind of sigmoid based on column maze approximate algorithm
Function Fitting hardware circuit.
Background technique
Neural network is the abbreviation of artificial neural network, research and the application of neural network be also current research hotspot it
One.Two aspects of advantage major embodiment of neural network, one is concurrency, another is exactly powerful nonlinear information processing
With learning ability.Currently, having there is the theoretical basis of multiple neural network models, working principle to reach mature level, make
The application further studied them in multiple related fieldss such as signal processing, control system, speech recognition is obtained as hot spot.With it is soft
Part simulation is compared, and has that processing speed is fast, high concurrency based on hard-wired neural network, it is easier to reach neural network reality
When operation requirement.
When realizing neural network with FPGA, there are two difficult point: one be data representation, the other is nerve net
The approach method of network activation primitive, this two o'clock determine the height of hardware resource utilization efficiency and the precision approached.Neural network
Activation primitive there are many form, Sigmoid function is most popular a kind of excitation function in neural network, realizes difficulty
Also maximum, it is the important link that neural network FPGA is realized.
Currently, the FPGA implementation method of sigmoid function has: direct look-up table, piecewise linear approximation, approximation by polynomi-als,
Cordic algorithm, genetic algorithm etc..Wherein direct loop up table (Zhiliang Nie, 2012;Alexander Gomperts,
2010) it is to store sigmoid operation result in a storage module, according to input operand, directly searches and read as a result, this
Method needs to expend a large amount of storage resource, and hardware realization precision is not high;Piecewise linear approximation method (Manish Panicker,
2012) in (- 5,5) range, using 3 segmentation piecewise linear approximations, number format, operation and storage resource are pinpointed using 32bit
It is less, but operational precision is lower, and maximum mean square deviation is 0.00187;Cordic algorithm (Xi Chen, 2006) is calculated using CORDIC
Method and lookup table algorithm combine, and data format is using the input of customized 16bit floating-point format and customized 32bit floating-point format
Output, calculation resources are big, and operational precision is very low.Genetic algorithm (Bharat Kishore Bharkhada, 2004) is in [0,8] model
In enclosing, integral coefficient segmental cubic polynomials are fitted using Gene hepatitis B vaccine, using 16 fixed point number formats, calculation resources are not
Height, storage resource is lower, and operational precision is not high, absolute error 2.4376 × 10-3;Polynomial approach algorithm is most commonly seen, tradition
Taylor series expansion method, need to consume a large amount of calculation resources of consumption, and operational precision is very low.More classical piecewise parabolic
Formula approximate algorithm (Joao O.P.Pinto, 2006) is using 5 rank multinomials of segmentation, and storage resource is low, and calculation resources are not high, operation
Precision is higher, and worst error is 8 × 10-5, this fitting precision is current all optimal precision that can reach in the prior art,
But it is not still able to satisfy high-accuracy arithmetic requirement.
And in terms of the selection of data format, above all of method is that raising operational precision is mostly customized floating-point lattice
Formula, and in real time high-speed process field, data format is often the 32bit single-precision floating point format of IEEE754 standard, it is this from
Data format is defined when communicating with other processing modules, it is also necessary to consider the conversion of data format, communication cost is larger.It is dropping
In terms of low consumption of resources, to reduce calculation resources consumption, loop up table is used, though operation result can be obtained, and is greatly reduced
Calculation resources consumption, but significantly increases storage resource.It is comprehensive the problem of due to algorithm used itself in terms of operational precision
The considerations of in terms of joint source, in current state of the art, hard-wired precision is not generally high, is far from satisfying high-precision in real time
Spend the requirement of processing;These are all the bottleneck problems of urgent need to resolve.
Summary of the invention
The present invention be to avoid above-mentioned the deficiencies in the prior art in place of, propose a kind of based on column maze approximate algorithm
Sigmoid Function Fitting hardware circuit, to improve operational precision on the basis of reducing hardware resource consumption, accelerate fortune
It calculates speed and promotes the flexibility of operating structure.
The present invention adopts the following technical scheme that in order to solve the technical problem
A kind of the characteristics of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm of the invention is by following step
It is rapid to carry out:
Step 1, basis given fitting precision u, calculation resources and storage resource, determine the order n of polynomial fitting;
Step 2, according to the fitting precision u, utilize formula (1) to obtain the fit interval [a, b] of sigmoid function f (x);
The fit interval [a, b] is divided into using symmetry shown in formula (2) with origin 0 for symmetrical centre by step 3
2m+2 minizone [a, q1],(q1,q2],…,(qm,0],(0,qm+1],…,(q2m,b];a,q1,q2,…,qm,0,qm+1,…
q2m, b respectively indicates the endpoint value of the 2m+2 minizone;q1,q2,…,qm,qm+1,…q2mRespectively indicate the 2m cell
Between scaling endpoint value;Extreme points set Q={ Q is successively constituted by the scaling endpoint value of the 2m minizone0,Q1,…,Qt,…
Q2m-1};QtIndicate the endpoint value of t-th of minizone in the scaling endpoint value of the 2m minizone;To obtain piecewise interval
[Q0,Q1],[Q1,Q2],…,[Qt,Qt+1],…,[Q2m-1,Q2m];T=0,1 ..., 2m-1;
F (- x)=1-f (x) (2)
Step 4, by the order n respectively with the section (0, b] on m sectored cells between m Vector Groups of composition [n,
Qm,Qm+1],[n,Qm+1,Qm+2],…,[n,Qε,Qε+1],…,[n,Q2m-1,Q2m];ε=m, m+1 ..., 2m-1, [n, Qε,Qε+1] table
Show the ε Vector Groups;The m Vector Groups are successively substituted into Remes algorithm, to successively obtain the piecewise interval respectively
Corresponding approximation accuracy um”,um+1”,…,ut”,…u2m-1";
Step 4.1 obtains the ε Vector Groups [n, the Q using formula (5)ε,Qε+1] corresponding to n+2 cut and to compare Xue Fuduo
The intercrossing point group of item formulaWith the ε intercrossing point groupAs ε initial point sets
To obtain m Vector Groups respectively corresponding to initial point set;
In formula (3), λ=0,1 ..., n+1;
Step 4.2 utilizes the ε initial point setsLinear side shown in solution formula (6)
The solution of journey groupTo according to the solutionObtain that ε is initial to be forced
Nearly multinomial
Step 4.3, in the ε piecewise interval [Qε,Qε+1] in obtain | f (x)-pε' (x) | when being maximum value it is corresponding from
VariableBy the independent variableWithTo characterize;
IfAndThen useInstead of
IfAndThen useInstead of
IfAndThen useInstead of
β=1,2 ..., n;To obtain the ε initial point setsUpdate point set;
Step 4.4 utilizes the ε initial point setsUpdate point set solve formula (6) institute
The more new explanation of the system of linear equations shownTo according to the more new explanation
Obtain the approximating polynomial of the ε update
Judgement | uε”-uε' | whether≤eps is true, if so, then with uε" it is used as the ε piecewise interval [Qε,Qε+1]
Corresponding approximation accuracy;Otherwise, step 4.3- step 4.4 is repeated;Until | uε”-uε' | until≤eps is set up;Eps expression is forced
Nearly error convergence controls precision;
Step 5 successively judges the approximation accuracy um”,um+1”,…,ut”,…u2m-1" whether meet the fitting precision
U meets corresponding to approximation accuracy if satisfied, then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section
The coefficient of approximating polynomial is the fitted polynomial coefficients that the fitting executes section;If not satisfied, then scaling described discontented
Scaling endpoint value in piecewise interval corresponding to sufficient approximation accuracy, and return step 4 executes, and meets the fitting until obtaining
The m fitting of precision u executes section and m group fitted polynomial coefficients;
If the independent variable x of step 6, the sigmoid function f (x) is interior at section (b ,+∞), then section (b ,+∞) conduct
Fitting executes section;And the constant term coefficient of polynomial fitting corresponding to section (b ,+∞) is that 1, remaining each term coefficient is
0;To obtain m+1 n order polynomial fitting, the fitting of sigmoid function is completed;
The coefficient of the m+1 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7;
Step 8, according to the n order polynomial fitting, utilize n floating-point adder, 2n-1 floating-point multiplier and (n-
2) × k deposit unit designs multinomial operation module;And a floating-point is designed in the output end of the multinomial operation module
Subtracter;K is the flowing water series of the floating-point adder, the floating-point multiplier and floating-point subtracter;
Step 9 executes block design judgment module according to the 2m+2 fitting;By the multinomial operation module, it is
Number memory module, floating-point subtracter and judgment module constitute fitting hardware circuit;
The input value of one step 10, input operand ω as the fitting hardware circuit;And utilize the judgement mould
Fitting where block judges the operand ω executes section;
If ω ∈ (0 ,+∞), then the fitting where reading the operand ω in the coefficient memory module executes area
Between corresponding polynomial fitting coefficient;
If ω ∈ (- ∞, 0], then the fitting where reading the operand ω in the coefficient memory module executes area
Between symmetric interval corresponding to polynomial fitting coefficient;
Step 12, the coefficient of polynomial fitting corresponding to the operand ω and the operand ω is read in it is described more
It is fitted calculating in item formula computing module, if ω ∈ (0 ,+∞), then the fitting result obtained is the fitting hardware circuit
Output valve;If ω ∈ (- ∞, 0], then the fitting result of acquisition and 1 are read in the floating-point subtracter, the calculating knot of acquisition
Fruit is the output valve of the fitting hardware circuit.
Compared with currently existing technology, the invention has the advantages that:
1, the column maze approximate algorithm that the present invention uses can satisfy different design objective requirements, if design objective requires
Very low calculation resources consumption and higher operational precision, can be appropriate to increase m's in the case where not changing fitting precision u
Value increases the number of minizone, reduces the order n of polynomial fitting, and design is made to meet design objective requirement;If design objective
It is required that lower storage resource consumption and higher operational precision, it can be appropriate to reduce in the case where not changing fitting precision u
The value of m is to reduce the number of minizone, to reduce coefficient storage resource consumption, design is made to meet design objective requirement;Thus
It is low to overcome fitting precision in currently existing technology, the big problem of resource consumption, so that polynomial fitting hardware circuit is being realized
There is stronger flexibility during fitting of a polynomial.
2, present invention employs multinomial coefficient memory modules, and hardware circuit design is made to have stronger scalability, for
Different fitting schemes need to only solidify the coefficient stored in memory module again.
3, present invention employs n floating-point adders, 2n-1 floating-point multiplier, and (n-2) × l deposit unit is utilized to post
The intermediate result of operand and corresponding stage is deposited, so that this circuit is able to carry out the pipeline computing of single precision floating datum, is improved
Arithmetic speed, so that design can satisfy the requirement of high speed real-time operation.
4, present invention employs judgment modules, and then loop up table and piecewise nonlinear approximatioss are combined, extension
The execution section of fitting function, within the scope of entire real number any operand value can obtain corresponding operation result.
5, the present invention is according to the symmetry of sigmoid function, scheme two only need to it is described (0, b] section is using the calculation of column maze
Method fitting, so as to which on the basis of not influencing operational precision, the resource consumption of coefficient memory module is reduced to original one
Half, the number for solving the coefficient of polynomial fitting is reduced to original half.
6, the present invention increases by one in multinomial operation module-external and subtracts according to the symmetry of sigmoid function, scheme two
Musical instruments used in a Buddhist or Taoist mass, to it is described (- ∞, a] section operand fitting result execute subtraction, can be in the base for not influencing operational precision
On plinth, final result is fast and accurately obtained.
7, different data formats can be used in the present invention, can be real for the single-precision floating point formatted data of IEEE754 format
Existing fitting precision is not less than 10-6.For other customized floating-point format data, in the case of identical resource consumption, using this hair
Bright circuit ratio can obtain higher fitting precision using other circuits.
Detailed description of the invention
Fig. 1 is the hardware circuit schematic diagram of the present invention program one;
Fig. 2 is the operation flow diagram of the present invention program one;
Fig. 3 is the multinomial operation circuit structure example implementation diagram of the present invention program one;
Fig. 4 is the hardware circuit schematic diagram of the present invention program two;
Fig. 5 is the operation flow diagram of the present invention program two;
Fig. 6 is the multinomial operation circuit structure example implementation diagram of the present invention program two.
Specific embodiment
In the present embodiment, a kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm is by following step
It is rapid to carry out:
Step 1, basis given fitting precision u, calculation resources and storage resource, determine the order n of polynomial fitting;
Step 2, according to fitting precision u, utilize formula (1) to obtain the fit interval [a, b] of sigmoid function f (x);For example,
In specific implementation, fitting precision u=10 is given-6, the order n=5 of polynomial fitting;To the fit interval [a, b] obtained
=[- 13.816,13.816];
Fit interval [a, b] is divided into 2m+2 with origin 0 using symmetry shown in formula (2) for symmetrical centre by step 3
A minizone [a, q1],(q1,q2],…,(qm,0],(0,qm+1],…,(q2m,b];a,q1,q2,…,qm,0,qm+1,…q2m, b points
Not Biao Shi 2m+2 minizone endpoint value;q1,q2,…,qm,qm+1,…q2mRespectively indicate the scaling endpoint value of 2m minizone;
Extreme points set Q={ Q is successively constituted by the scaling endpoint value of 2m minizone0,Q1,…,Qt,…Q2m-1};QtIndicate 2m cell
Between scaling endpoint value in t-th of minizone endpoint value;To obtain piecewise interval [Q0,Q1],[Q1,Q2],…,[Qt,
Qt+1],…,[Q2m-1,Q2m];T=0,1 ..., 2m-1;
In the present embodiment, take m=7, by fit interval [- 13.816,13.816] be divided into 14 minizones [-
13.816,-10],(-10,-8],(-8,-6],(-6,-4],(-4,-2],(-2,-1],(-1,0],(0,1],(1,2],(2,
4], (4,6], (6,8], (8,10], (10,13.816], to obtain 14 piecewise intervals successively are as follows: [- 13.816, -10], (-
10,-8],(-8,-6],(-6,-4],(-4,-2],(-2,-1],(-1,0],(0,1],(1,2],(2,4],(4,6],(6,8],
(8,10],(10,13.816];
F (- x)=1-f (x) (2)
The symmetry as shown in formula (2) is it is found that the fitting of sigmoid function f (x) can execute in entire fit interval
Fitting, obtains fitting result, can also only do the fitting in the section x ∈ (0 ,+∞), and x ∈ (- ∞, 0] fitting result in section can be with
Using the fitting result of formula (2) and its symmetric interval obtain, therefore can there are two types of scheme realize sigmoid function fitting,
Wherein scheme one are as follows:
Order n is formed 2m Vector Groups [n, Q with 2m piecewise interval respectively by step 40,Q1],[n,Q1,Q2],…,
[n,Qt,Qt+1],…,[n,Q2m,Q2m+1];[n,Qt,Qt+1] indicate t-th of Vector Groups;In the present embodiment, 14 Vector Groups are successively
It is [5, -13.816, -10], [5, -10, -8], [5, -8, -6], [5, -6, -4], [5, -4, -2], [5, -2, -1], [5, -1,
0], [5,0,1], [5,1,2], [5,2,4], [5,4,6], [5,6,8], [5,8,10], [5,10,13.816], by 14 vectors
Group successively substitute into Remes algorithm, thus successively obtain piecewise interval respectively corresponding to approximation accuracy u0”,u1”,…,ut”,…
u2m+1";
Step 4.1 obtains t-th Vector Groups [n, Q using formula (3)t,Qt+1] corresponding to n+2 cut than Xue's husband's multinomial
Intercrossing point groupWith t-th of intercrossing point groupAs t-th of initial point setTo obtain
The respective corresponding initial point set of 2m Vector Groups;
In formula (3), k=0,1 ..., n+1;
Step 4.2 utilizes t-th of initial point setSystem of linear equations shown in solution formula (4)
SolutionTo according to solutionObtain t-th of initial approximating polynomial
Step 4.3, in t-th of piecewise interval [Qt,Qt+1] in obtain | f (x)-pt' (x) | when being maximum value it is corresponding from
VariableBy independent variableWithTo characterize;
IfAndThen useInstead of
IfAndThen useInstead of
IfAndThen useInstead ofTo obtain t-th of initial point setUpdate point set;
Step 4.4 utilizes t-th of initial point setUpdate point set solve formula (4) shown in line
The more new explanation of property equation groupTo according to more new explanationIt obtains t-th
The approximating polynomial of update
Judgement | ut”-ut' | whether≤eps is true, if so, then with ut" it is used as t-th of piecewise interval [Qt,Qt+1] institute it is right
The approximation accuracy answered;Otherwise, step 4.3- step 4.4 is repeated;Until | ut”-ut' | until≤eps is set up;Eps is approximate error
Convergence control precision;
Step 5 successively judges approximation accuracy u0”,u1”,…,ut”,…u2m-1" whether meet fitting precision u, if satisfied,
Then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section, meets approximating polynomial corresponding to approximation accuracy
Coefficient be fitted execute section fitted polynomial coefficients;If not satisfied, then scaling is unsatisfactory for corresponding to approximation accuracy
Scaling endpoint value in piecewise interval, and return step 4 executes, and the 2m+1 fitting execution of fitting precision u is met until obtaining
Section and 2m+1 group fitted polynomial coefficients;
If the independent variable x of step 6, sigmoid function f (x) in section (b ,+∞), then section (b ,+∞) is as fitting
Execute section;And it is 0 that the constant term coefficient of polynomial fitting corresponding to section (b ,+∞), which is 1, remaining each term coefficient,;If
In section, (- ∞, a) interior, then (- ∞ a) executes section, and section as fitting to the independent variable x of sigmoid function f (x) in section
(- ∞, a) corresponding to each term coefficient of polynomial fitting be 0;To obtain 2m+2 n order polynomial fitting, complete
The fitting of sigmoid function;
In the present embodiment, the constant term coefficient of 5 rank polynomial fittings corresponding to section (13.816 ,+∞) is 1, section
Remaining each term coefficient of 5 rank polynomial fittings corresponding to (13.816 ,+∞) is 0;Section (- ∞, -13.816) is corresponding
Each term coefficients of 5 rank polynomial fittings be 0;
16 fittings that the present embodiment is obtained after step 5 and step 6 execute section are as follows: and (- ∞, -13.816), [-
13.816,-11],(-11,-7],(-7,5],(-5,-3],(-3,-2],(-2,-1],(-1,0],(0,1],(1,2],(2,3],
(3,5], (5,7], (7,11], (11,13.816], (13.816 ,+∞) completes the fitting of sigmoid function.
The coefficient of 2m+2 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7;This reality
It applies in example, 16 fittings is executed into the corresponding polynomial coefficient in section and are solidificated in ROM, and address is write according to storage rule
Rule is read, Coefficient Look-up Table is constituted.
Step 8, according to n order polynomial fitting, using n floating-point adder, 2n-1 floating-point multiplier and (n-2) ×
K deposit unit designs multinomial operation module;K is the flowing water series of floating-point adder or floating-point multiplier;In the present embodiment,
Multinomial operation module is designed using 5 floating-point adders, 9 floating-point multipliers and 6 reg deposit units, wherein floating-point is transported
The flowing water series for calculating device is 2 grades.
Step 9 executes block design judgment module according to 2m+2 fitting;Mould is stored by multinomial operation module, coefficient
Block and judgment module constitute fitting hardware circuit as shown in Figure 1;In Fig. 1, data_i is the source operand of input, and data_o is
The operation result of output.
Shown in step 10, Fig. 2, input value of the operand ω as fitting hardware circuit is inputted;And it utilizes and judges mould
Fitting where block judges operand ω executes section;
Step 11, from where read operands ω in coefficient memory module fitting execute section corresponding to fitting it is multinomial
The coefficient of formula;
The coefficient of polynomial fitting corresponding to operand ω and operand ω is read in multinomial operation module by step 12
In be fitted calculating, thus obtain fitting result as fitting hardware circuit output valve.
Designed multinomial operation module out is as shown in figure 3, the IEEE754 standard list used in this embodiment scheme one
Precision floating point data format, operational precision are not less than 10-65 polynomial fitting hardware circuit implementation structure charts, including 9 multiply
Musical instruments used in a Buddhist or Taoist mass and 5 adders and 6 reg deposit units;The multinomial realized is p (x)=Ax5+Bx4+Cx3+Dx2+ Ex+F,
Result is the final output of operation as a result, concrete operation process is as follows:
Step a: source operand x enters multinomial operation module, reads coefficient E, and x enters multiplier Multi_1 and completes E*x
Operation is simultaneously exported to next stage, and x enters multiplier Multi_2 and completes x2Operation is simultaneously exported to next stage, and it is temporary that x enters reg_1
Two-stage waits and participates in next stage operation, and 2 multipliers of the first order complete operation parallel, and it is 2 grades that multiplier flowing water series, which is all provided with,;
Step b: coefficient F and E*x are read and enters adder Add_1 completion x5Result is simultaneously output to next stage by operation, is read
Take coefficient D and x2D*x is completed into multiplier Multi_32Operation is simultaneously exported to next stage, x2Enter multiplier Multi_4 with x
Complete x3Operation is simultaneously exported to next stage, and the x of upper level deposit enters reg_2 and continues temporary two-stage, waits and participates in next stage fortune
It calculates, the floating point calculator of the second level 3 completes operation parallel, and flowing water series is disposed as 2 grades;
Step c: (E*x+F) and D*x are read2(Dx is completed into adder Add_22+E*x2+ F) operation and export to next
Grade reads coefficient C and x3C*x is completed into multiplier Multi_53Operation is simultaneously exported to next stage, reads x3It is deposited with upper level
X enter Multi_6 complete x4Operation is simultaneously exported to next stage, and the x of upper level deposit enters reg_3 and continues temporary two-stage, etc.
Next stage operation to be participated in, 3 floating point calculators of the third level complete operation parallel, and flowing water series is disposed as 2 grades;
Step d: (Dx is read2+E*x2+ F) and C*x3(C*x is completed into adder Add_33+Dx2+ Ex+F) operation and defeated
Out to next stage, coefficient B and x are read4B*x is completed into multiplier Multi_74Operation is simultaneously exported to next stage, reads x4With it is upper
The x of level-one deposit enters multiplier Multi_8 and completes x5Operation is simultaneously exported to next stage, and 3 floating point calculators of the fourth stage are parallel
Operation is completed, flowing water series is disposed as 2 grades;
Step e: (C*x is read3+Dx2+ Ex+F) and B*x4(B*x is completed into adder Add_44+C*x3+D*x2+E*x+
F it) operation and exports to next stage, reads coefficient A and x5A*x is completed into multiplier Multi_95Operation is simultaneously exported to next stage,
2 floating point calculators of level V complete operation parallel, and flowing water series is disposed as 2 grades;
Step f: adder Add_5 completes (A*x5+B*x4+C*x3+D*x2+ E*x+F) it operation and exports, the stream of adder
Water series is set as 2 grades;Operation result is final result, is directly exported;
More than completion after each step, the processing of the sigmoid Function Fitting in the present invention is just completed.It is each to count this example
The clock periodicity of a step, every grade of operation flowing water series are 2, and totally 6 grades, the fitting operation for completing single source operand needs 13
A clock cycle, fitting precision are not less than 10-6, maximum mean square deviation is no more than 8.74 × 10-14.The fitting precision is much higher than current
Optimal fitting precision in the prior art, resource consumption is lower, and data format is IEEE754 single-precision floating point format, Neng Gougeng
Good is applied in high-precision high-speed real-time operation.
Scheme one uses less floating-point operation resource and less floating-point operation series, thus arithmetic speed is faster, but
Coefficient memory module will store more fitted polynomial coefficients, increase storage resource.In addition, though entire sigmoid letter
Several fitting precisions is all very high, but due to using different polynomial fittings at left and right sides of origin, about origin symmetry
The corresponding fitting precision of two fit intervals will be different.
Scheme two: step 4- step 12 can also carry out as follows:
Step 4, by order n respectively with section (0, b] on m sectored cells between form m Vector Groups [n, Qm,Qm+1],
[n,Qm+1,Qm+2],…,[n,Qε,Qε+1],…,[n,Q2m-1,Q2m];ε=m, m+1 ..., 2m-1, [n, Qε,Qε+1] indicate ε
Vector Groups;M Vector Groups are successively substituted into Remes algorithm, thus successively obtain piecewise interval respectively corresponding to approximation accuracy
um”,um+1”,…,ut”,…u2m-1";
Step 4.1 obtains the ε Vector Groups [n, Q using formula (5)ε,Qε+1] corresponding to n+2 cut than Xue's husband's multinomial
Intercrossing point groupWith the ε intercrossing point groupAs ε initial point setsTo obtain m
The respective corresponding initial point set of a Vector Groups;
In formula (3), λ=0,1 ..., n+1;
Step 4.2 utilizes ε initial point setsSystem of linear equations shown in solution formula (6)
SolutionTo according to solutionObtain ε initial approximating polynomials
Step 4.3, in the ε piecewise interval [Qε,Qε+1] in obtain | f (x)-pε' (x) | when being maximum value it is corresponding from
VariableBy independent variableWithTo characterize;
IfAndThen useInstead of
IfAndThen useInstead of
IfAndThen useInstead of
β=1,2 ..., n;To obtain ε initial point setsUpdate point set;
Step 4.4 utilizes ε initial point setsUpdate point set solve formula (6) shown in
The more new explanation of system of linear equationsTo according to more new explanationIt obtains
The approximating polynomial of the ε update
Judgement | uε”-uε' | whether≤eps is true, if so, then with uε" it is used as the ε piecewise interval [Qε,Qε+1] institute it is right
The approximation accuracy answered;Otherwise, step 4.3- step 4.4 is repeated;Until | uε”-uε' | until≤eps is set up;Eps expression approaches mistake
Difference convergence control precision.
Step 5 successively judges approximation accuracy um”,um+1”,…,ut”,…u2m-1" whether meet fitting precision u, if satisfied,
Then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section, meets approximating polynomial corresponding to approximation accuracy
Coefficient be fitted execute section fitted polynomial coefficients;If not satisfied, then scaling is unsatisfactory for corresponding to approximation accuracy
Scaling endpoint value in piecewise interval, and return step 4 executes, and the m fitting execution section of fitting precision u is met until obtaining
With m group fitted polynomial coefficients;
If the independent variable x of step 6, sigmoid function f (x) in section (b ,+∞), then section (b ,+∞) is as fitting
Execute section;And it is 0 that the constant term coefficient of polynomial fitting corresponding to section (b ,+∞), which is 1, remaining each term coefficient,;From
And m+1 n order polynomial fitting is obtained, complete the fitting of sigmoid function;
In the present embodiment, the constant term coefficient of 5 rank polynomial fittings corresponding to section (13.816 ,+∞) is 1, section
Remaining each term coefficient of 5 rank polynomial fittings corresponding to (13.816 ,+∞) is 0;
By step 5 and step 6, obtain this example implement 8 fittings execution sections (0,1], (1,2], (2,3], (3,
5],(5,7],(7,11],(11,13.816],(13.816,+∞);To complete the fitting of sigmoid function.
The coefficient of m+1 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7;This implementation
In example, 8 fittings are executed into the corresponding polynomial coefficient in section and are solidificated in ROM, and address is write according to storage rule and is read
Rule is taken, Coefficient Look-up Table is constituted.
Step 8, according to n order polynomial fitting, using n floating-point adder, 2n-1 floating-point multiplier and (n-2) ×
K deposit unit designs multinomial operation module;And a floating-point subtracter is designed in the output end of multinomial operation module;k
For the flowing water series of floating-point adder, floating-point multiplier and floating-point subtracter;In the present embodiment, using 5 floating-point adders, 9
A floating-point multiplier and 6 reg deposit units design multinomial operation module, and wherein the flowing water series of floating point calculator is 2
Grade.
Step 9 executes block design judgment module according to 2m+2 fitting;Mould is stored by multinomial operation module, coefficient
Block, floating-point subtracter and judgment module constitute fitting hardware circuit as shown in Figure 4;In Fig. 4, data_i is the source operation of input
Number, data_o are the operation results of output.
Shown in step 10, Fig. 5, input value of the operand ω as fitting hardware circuit is inputted;And it utilizes and judges mould
Fitting where block judges operand ω executes section;
If ω ∈ (0 ,+∞), then executed corresponding to section from the fitting where read operands ω in coefficient memory module
Polynomial fitting coefficient;If ω ∈ (- ∞, 0], then it is executed from the fitting where read operands ω in coefficient memory module
The coefficient of polynomial fitting corresponding to the symmetric interval in section;
The coefficient of polynomial fitting corresponding to operand ω and operand ω is read in multinomial operation module by step 12
In be fitted calculating, if ω ∈ (0 ,+∞), then the fitting result obtained be fitted hardware circuit output valve;If ω ∈
(- ∞, 0], then the fitting result of acquisition and 1 are read in floating-point subtracter, the calculated result of acquisition is to be fitted hardware circuit
Output valve.
Designed multinomial operation module out is as shown in fig. 6, the IEEE754 standard list used in this embodiment scheme two
Precision floating point data format, operational precision are not less than 10-65 polynomial fitting hardware circuit implementation structure charts, including 9 multiply
Musical instruments used in a Buddhist or Taoist mass and 5 adders and 6 reg deposit units.The multinomial realized is p (x)=Ax5+Bx4+Cx3+Dx2+ Ex+F,
Result is the final output of operation as a result, concrete operation process is as follows:
Step a: source operand x enters multinomial operation module, reads coefficient E, and x enters multiplier Multi_1 and completes E*x
Operation is simultaneously exported to next stage, and x enters multiplier Multi_2 and completes x2Operation is simultaneously exported to next stage, and it is temporary that x enters reg_1
Two-stage waits and participates in next stage operation, and 2 multipliers of the first order complete operation parallel, and it is 2 grades that multiplier flowing water series, which is all provided with,;
Step b: coefficient F and E*x are read and enters adder Add_1 completion x5Result is simultaneously output to next stage by operation, is read
Take coefficient D and x2D*x is completed into multiplier Multi_32Operation is simultaneously exported to next stage, x2Enter multiplier Multi_4 with x
Complete x3Operation is simultaneously exported to next stage, and the x of upper level deposit enters reg_2 and continues temporary two-stage, waits and participates in next stage fortune
It calculates, the floating point calculator of the second level 3 completes operation parallel, and flowing water series is disposed as 2 grades;
Step c: (E*x+F) and D*x are read2(Dx is completed into adder Add_22+E*x2+ F) operation and export to next
Grade reads coefficient C and x3C*x is completed into multiplier Multi_53Operation is simultaneously exported to next stage, reads x3It is deposited with upper level
X enter Multi_6 complete x4Operation is simultaneously exported to next stage, and the x of upper level deposit enters reg_3 and continues temporary two-stage, etc.
Next stage operation to be participated in, 3 floating point calculators of the third level complete operation parallel, and flowing water series is disposed as 2 grades;
Step d: (Dx is read2+E*x2+ F) and C*x3(C*x is completed into adder Add_33+Dx2+ Ex+F) operation and defeated
Out to next stage, coefficient B and x are read4B*x is completed into multiplier Multi_74Operation is simultaneously exported to next stage, reads x4With it is upper
The x of level-one deposit enters multiplier Multi_8 and completes x5Operation is simultaneously exported to next stage, and 3 floating point calculators of the fourth stage are parallel
Operation is completed, flowing water series is disposed as 2 grades;
Step e: (C*x is read3+Dx2+ Ex+F) and B*x4(B*x is completed into adder Add_44+C*x3+D*x2+E*x+
F it) operation and exports to next stage, reads coefficient A and x5A*x is completed into multiplier Multi_95Operation is simultaneously exported to next stage,
2 floating point calculators of level V complete operation parallel, and flowing water series is disposed as 2 grades;
Step f: adder Add_5 completes (A*x5+B*x4+C*x3+D*x2+ E*x+F) it operation and exports, the stream of adder
Water series is set as 2 grades;
Step g: if source operand is on section (0 ,+∞), then upper level operation result is final result, directly defeated
Out;If source operand on section (- ∞, 0), then does subtraction operation with upper level operation result for 1 using subtracter Add_6,
Operation result is final result, is directly exported, and the flowing water series of subtracter is set as 2 grades.
More than completion after each step, the processing of the sigmoid Function Fitting in the present invention is just completed.It is each to count this example
The clock periodicity of a step, every grade of operation flowing water series are 2, and totally 7 grades, the fitting operation for completing single source operand needs 15
A clock cycle, fitting precision are not less than 10-6, maximum mean square deviation is no more than 8.74 × 10-14, maximum mean square deviation is no more than 8.74
×10-14.The fitting precision is much higher than optimal fitting precision in currently existing technology, and resource consumption is lower, and data format is
IEEE754 single-precision floating point format can preferably be applied in high-precision high-speed real-time operation.
Two coefficient memory module of scheme stores less fitted polynomial coefficients, reduces storage resource consumption, and reduce
Digital simulation polynomial workload.Due to using identical polynomial fitting at left and right sides of origin, about origin
The corresponding fitting precision of symmetrical two fit intervals is identical, is more convenient for doing error analysis.Although entire sigmoid function is quasi-
The requirement that arithmetic speed meets real time high-speed operation is closed, but due to increasing a subtracter and operation series, thus increase
Calculation resources consumption, reduce arithmetic speed.
To sum up, the present invention utilizes column maze approximate algorithm, can quickly and effectively complete sigmoid functional operation, realizes
The fitting operation of degree of precision, so that single-precision floating point operation for IEEE754 standard, in the requirement of high-precision hardware realization
Lower worst error is no more than 10-6, and for non-IEEE754 standard data, equivalent technology can also be obtained using this structure
The more currently existing better fitting precision of technology under index request.This method circuit structure is simple, and scale is limited, and use is fewer
Operation can be completed in the adder and multiplier of amount, greatly reduces calculation resources consumption, and flexibility is higher, guarantee operation high speed and
While concurrency requires, the precision and performance of sigmoid Function Fitting operation are effectively improved, solves currently existing skill
The bottleneck problem that art faces.
Claims (1)
1. a kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm, it is characterized in that carrying out as follows:
Step 1, basis given fitting precision u, calculation resources and storage resource, determine the order n of polynomial fitting;
Step 2, according to the fitting precision u, utilize formula (1) to obtain the fit interval [a, b] of sigmoid function f (x);
The fit interval [a, b] is divided into 2m+2 with origin 0 using symmetry shown in formula (2) for symmetrical centre by step 3
A minizone [a, q1],(q1,q2],…,(qm,0],(0,qm+1],…,(q2m,b];a,q1,q2,…,qm,0,qm+1,…q2m, b points
The endpoint value of the 2m+2 minizone is not indicated;q1,q2,…,qm,qm+1,…q2mRespectively indicate the scaling end of 2m minizone
Point value;Extreme points set Q={ Q is successively constituted by the scaling endpoint value of the 2m minizone0,Q1,…,Qt,…Q2m-1};QtIt indicates
The endpoint value of t-th of minizone in the scaling endpoint value of the 2m minizone;To obtain piecewise interval [Q0,Q1],[Q1,
Q2],…,[Qt,Qt+1],…,[Q2m-1,Q2m];T=0,1 ..., 2m-1;
F (- x)=1-f (x) (2)
Step 4, by the order n respectively with the section (0, b] on m sectored cells between form m Vector Groups [n, Qm,
Qm+1],[n,Qm+1,Qm+2],…,[n,Qε,Qε+1],…,[n,Q2m-1,Q2m];ε=m, m+1 ..., 2m-1, [n, Qε,Qε+1] indicate
The ε Vector Groups;The m Vector Groups are successively substituted into Remes algorithm, to successively obtain the piecewise interval respectively institute
Corresponding approximation accuracy um”,um+1”,…,ut”,…u2m-1";
Step 4.1 obtains the ε Vector Groups [n, the Q using formula (5)ε,Qε+1] corresponding to n+2 cut than Xue's husband's multinomial
Intercrossing point groupWith the ε intercrossing point groupAs ε initial point setsTo
Obtain m Vector Groups respectively corresponding to initial point set;
In formula (3), λ=0,1 ..., n+1;
Step 4.2 utilizes the ε initial point setsSystem of linear equations shown in solution formula (6)
SolutionTo according to the solutionObtain ε initial approximating polynomials
Step 4.3, in the ε piecewise interval [Qε,Qε+1] in obtain | f (x)-pε' (x) | corresponding independent variable when being maximum valueBy the independent variableWithTo characterize;
IfAndThen useInstead of
IfAndThen useInstead of
IfAndThen useInstead ofβ=
1,2,…,n;To obtain the ε initial point setsUpdate point set;
Step 4.4 utilizes the ε initial point setsUpdate point set solve formula (6) shown in
The more new explanation of system of linear equationsTo according to the more new explanation
Obtain the approximating polynomial of e-th of update
Judgement | uε”-uε' | whether≤eps is true, if so, then with uε" it is used as e-th of piecewise interval [Qε,Qε+1] institute it is right
The approximation accuracy answered;Otherwise, step 4.3- step 4.4 is repeated;Until | uε”-uε' | until≤eps is set up;Eps expression approaches mistake
Difference convergence control precision;
Step 5 successively judges the approximation accuracy um”,um+1”,…,ut”,…u2m-1" whether meet the fitting precision u, if
Meet, then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section, meets and approaches corresponding to approximation accuracy
Polynomial coefficient is the fitted polynomial coefficients that the fitting executes section;If not satisfied, then being unsatisfactory for forcing described in scaling
Scaling endpoint value in piecewise interval corresponding to nearly precision, and return step 4 executes, and meets the fitting precision until obtaining
The m fitting of u executes section and m group fitted polynomial coefficients;
If the independent variable x of step 6, the sigmoid function f (x) is interior at section (b ,+∞), then section (b ,+∞) is as fitting
Execute section;And it is 0 that the constant term coefficient of polynomial fitting corresponding to section (b ,+∞), which is 1, remaining each term coefficient,;From
And m+1 n order polynomial fitting is obtained, complete the fitting of sigmoid function;
The coefficient of the m+1 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7;
Step 8, according to the n order polynomial fitting, using n floating-point adder, 2n-1 floating-point multiplier and (n-2) ×
K deposit unit designs multinomial operation module;And a floating-point subtraction is designed in the output end of the multinomial operation module
Device;K is the flowing water series of the floating-point adder, the floating-point multiplier and floating-point subtracter;
Step 9 executes block design judgment module according to the 2m+2 fitting;It is deposited by the multinomial operation module, coefficient
It stores up module, floating-point subtracter and judgment module and constitutes fitting hardware circuit;
The input value of one step 10, input operand ω as the fitting hardware circuit;And sentenced using the judgment module
The fitting broken where the operand ω executes section;
If ω ∈ (0 ,+∞), then the fitting where reading the operand ω in the coefficient memory module executes section institute
The coefficient of corresponding polynomial fitting;
If ω ∈ (- ∞, 0], then the fitting where reading the operand ω in the coefficient memory module executes section
The coefficient of polynomial fitting corresponding to symmetric interval;
The coefficient of polynomial fitting corresponding to the operand ω and the operand ω is read in the multinomial by step 12
Calculating is fitted in computing module, if ω ∈ (0 ,+∞), then the fitting result obtained is the defeated of the fitting hardware circuit
It is worth out;If ω ∈ (- ∞, 0], then the fitting result of acquisition and 1 are read in the floating-point subtracter, the calculated result of acquisition is
For the output valve of the fitting hardware circuit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710416069.6A CN107247992B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410850470.7A CN104484703B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm |
CN201710416069.6A CN107247992B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410850470.7A Division CN104484703B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107247992A CN107247992A (en) | 2017-10-13 |
CN107247992B true CN107247992B (en) | 2019-08-30 |
Family
ID=52759244
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710416069.6A Active CN107247992B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm |
CN201410850470.7A Active CN104484703B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410850470.7A Active CN104484703B (en) | 2014-12-30 | 2014-12-30 | A kind of sigmoid Function Fitting hardware circuits based on row maze approximate algorithm |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN107247992B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102359265B1 (en) * | 2015-09-18 | 2022-02-07 | 삼성전자주식회사 | Processing apparatus and method for performing operation thereof |
CN105893159B (en) * | 2016-06-21 | 2018-06-19 | 北京百度网讯科技有限公司 | Data processing method and device |
US10552732B2 (en) * | 2016-08-22 | 2020-02-04 | Kneron Inc. | Multi-layer neural network |
CN106682732B (en) * | 2016-12-14 | 2019-03-29 | 浙江大学 | A kind of Gauss error function circuit applied to neural network |
CN108205518A (en) * | 2016-12-19 | 2018-06-26 | 上海寒武纪信息科技有限公司 | Obtain device, method and the neural network device of functional value |
US10997492B2 (en) * | 2017-01-20 | 2021-05-04 | Nvidia Corporation | Automated methods for conversions to a lower precision data format |
CN107480771B (en) * | 2017-08-07 | 2020-06-02 | 北京中星微人工智能芯片技术有限公司 | Deep learning-based activation function realization method and device |
CN107704422A (en) * | 2017-10-13 | 2018-02-16 | 武汉精测电子集团股份有限公司 | A kind of parallel calculating method and device based on PLD |
CN108154224A (en) * | 2018-01-17 | 2018-06-12 | 北京中星微电子有限公司 | For the method, apparatus and non-transitory computer-readable medium of data processing |
US10977854B2 (en) | 2018-02-27 | 2021-04-13 | Stmicroelectronics International N.V. | Data volume sculptor for deep learning acceleration |
US11687762B2 (en) * | 2018-02-27 | 2023-06-27 | Stmicroelectronics S.R.L. | Acceleration unit for a deep learning engine |
US11586907B2 (en) | 2018-02-27 | 2023-02-21 | Stmicroelectronics S.R.L. | Arithmetic unit for deep learning acceleration |
CN108537332A (en) * | 2018-04-12 | 2018-09-14 | 合肥工业大学 | A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms |
CN109934336B (en) * | 2019-03-08 | 2023-05-16 | 江南大学 | Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform |
CN110070170A (en) * | 2019-05-23 | 2019-07-30 | 福州大学 | PSO-BP neural network sensor calibrating system and method based on MCU |
CN110647718B (en) * | 2019-09-26 | 2023-07-25 | 中昊芯英(杭州)科技有限公司 | Data processing method, device, equipment and computer readable storage medium |
CN110837885B (en) * | 2019-10-11 | 2021-03-02 | 西安电子科技大学 | Sigmoid function fitting method based on probability distribution |
CN110796247B (en) * | 2020-01-02 | 2020-05-19 | 深圳芯英科技有限公司 | Data processing method, device, processor and computer readable storage medium |
CN111191766B (en) * | 2020-01-02 | 2023-05-16 | 中昊芯英(杭州)科技有限公司 | Data processing method, device, processor and computer readable storage medium |
CN111191779B (en) * | 2020-01-02 | 2023-05-30 | 中昊芯英(杭州)科技有限公司 | Data processing method, device, processor and computer readable storage medium |
US11507831B2 (en) | 2020-02-24 | 2022-11-22 | Stmicroelectronics International N.V. | Pooling unit for deep learning acceleration |
CN111680782B (en) * | 2020-05-20 | 2022-09-13 | 河海大学常州校区 | FPGA-based RBF neural network activation function implementation method |
CN112528211B (en) * | 2020-12-17 | 2022-12-20 | 中电科思仪科技(安徽)有限公司 | Method for fitting solar cell IV curve |
CN112859086B (en) * | 2021-01-25 | 2024-02-27 | 聚融医疗科技(杭州)有限公司 | Self-adaptive rapid arctangent system, method and ultrasonic imaging device |
CN114567396A (en) * | 2022-02-28 | 2022-05-31 | 哲库科技(北京)有限公司 | Wireless communication method, fitting method of nonlinear function, terminal and equipment |
CN114900257B (en) * | 2022-05-26 | 2024-05-14 | Oppo广东移动通信有限公司 | Baseband chip, channel estimation method, data processing method and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1741394A (en) * | 2005-09-16 | 2006-03-01 | 北京中星微电子有限公司 | Method for computing nonlinear function in inverse quantization formula |
CN102708381A (en) * | 2012-05-09 | 2012-10-03 | 江南大学 | Improved extreme learning machine combining learning thought of least square vector machine |
CN103729688A (en) * | 2013-12-18 | 2014-04-16 | 北京交通大学 | Section traffic neural network prediction method based on EMD |
CN103809930A (en) * | 2014-01-24 | 2014-05-21 | 天津大学 | Design method of double-precision floating-point divider and divider |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101527010B (en) * | 2008-03-06 | 2011-12-07 | 上海理工大学 | Hardware realization method and system for artificial neural network algorithm |
-
2014
- 2014-12-30 CN CN201710416069.6A patent/CN107247992B/en active Active
- 2014-12-30 CN CN201410850470.7A patent/CN104484703B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1741394A (en) * | 2005-09-16 | 2006-03-01 | 北京中星微电子有限公司 | Method for computing nonlinear function in inverse quantization formula |
CN102708381A (en) * | 2012-05-09 | 2012-10-03 | 江南大学 | Improved extreme learning machine combining learning thought of least square vector machine |
CN103729688A (en) * | 2013-12-18 | 2014-04-16 | 北京交通大学 | Section traffic neural network prediction method based on EMD |
CN103809930A (en) * | 2014-01-24 | 2014-05-21 | 天津大学 | Design method of double-precision floating-point divider and divider |
Non-Patent Citations (1)
Title |
---|
"基于FPGA的神经网络硬件实现的研究与设计";刘培龙;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130715(第7期);I135-626页 |
Also Published As
Publication number | Publication date |
---|---|
CN104484703B (en) | 2017-06-30 |
CN104484703A (en) | 2015-04-01 |
CN107247992A (en) | 2017-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107247992B (en) | A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm | |
Gokhale et al. | Snowflake: An efficient hardware accelerator for convolutional neural networks | |
CN116894145A (en) | Block floating point for neural network implementation | |
CN110276450A (en) | Deep neural network structural sparse system and method based on more granularities | |
CN107609641A (en) | Sparse neural network framework and its implementation | |
CN109146067B (en) | Policy convolution neural network accelerator based on FPGA | |
CN106951211B (en) | A kind of restructural fixed and floating general purpose multipliers | |
CN108537332A (en) | A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms | |
CN106155627B (en) | Low overhead iteration trigonometric device based on T_CORDIC algorithm | |
CN107633298A (en) | A kind of hardware structure of the recurrent neural network accelerator based on model compression | |
CN102103479A (en) | Floating point calculator and processing method for floating point calculation | |
CN102184161B (en) | Matrix inversion device and method based on residue number system | |
CN109325590B (en) | Device for realizing neural network processor with variable calculation precision | |
CN103176948A (en) | Single precision elementary function operation accelerator low in cost | |
CN112540946A (en) | Reconfigurable processor and method for calculating activation functions of various neural networks on reconfigurable processor | |
CN103902762A (en) | Circuit structure for conducting least square equation solving according to positive definite symmetric matrices | |
CN212569855U (en) | Hardware implementation device for activating function | |
CN109298848A (en) | The subduplicate circuit of double mode floating-point division | |
Kang et al. | Design of convolution operation accelerator based on FPGA | |
CN113191494A (en) | Efficient LSTM accelerator based on FPGA | |
CN111860792A (en) | Hardware implementation device and method for activating function | |
Yang et al. | A Parallel Processing CNN Accelerator on Embedded Devices Based on Optimized MobileNet | |
Karthickkeyan et al. | Booth Multiplier-Based Robust Model of FIR Filters for VLSI Applications | |
CN103699729A (en) | Modulus multiplier | |
Yunfu et al. | Design and implementation of R4-MSD square root algorithm in ternary optical computer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201231 Address after: 245000 No. 50, Meilin Avenue, Huangshan Economic Development Zone, Anhui Province Patentee after: Huangshan Development Investment Group Co.,Ltd. Address before: Tunxi road in Baohe District of Hefei city of Anhui Province, No. 193 230009 Patentee before: Hefei University of Technology |