CN107247992B

CN107247992B - A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm

Info

Publication number: CN107247992B
Application number: CN201710416069.6A
Authority: CN
Inventors: 宋宇鲲; 王浩; 张多利; 杜高明
Original assignee: Hefei University of Technology
Current assignee: Huangshan Development Investment Group Co.,Ltd.
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2019-08-30
Anticipated expiration: 2034-12-30
Also published as: CN104484703B; CN104484703A; CN107247992A

Abstract

The invention discloses a kind of sigmoid Function Fitting hardware circuits based on column maze approximate algorithm, it is characterized in that following steps carry out: 1 determines the order of polynomial fitting；2 obtain the fit interval of sigmoid function；3 obtain piecewise interval；4 obtain polynomial fitting；5 design ratio memory modules；6 design multinomial operation modules；7 design judgment modules；8 obtain fitting hardware circuit；9 judge the fitting execution section where operand；10 read fitted polynomial coefficients；11 are fitted calculating in multinomial operation module.The present invention can improve operational precision, accelerate arithmetic speed and promote the flexibility of operating structure on the basis of reducing hardware resource consumption.

Description

A kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm

The application is the applying date are as follows: on December 30th, 2014, application No. is 2014108504707, titles are as follows: Yi Zhongji In the divisional application of the sigmoid Function Fitting hardware circuit of column maze approximate algorithm.

Technical field

The present invention relates to artificial neural network field, specifically a kind of sigmoid based on column maze approximate algorithm Function Fitting hardware circuit.

Background technique

Neural network is the abbreviation of artificial neural network, research and the application of neural network be also current research hotspot it One.Two aspects of advantage major embodiment of neural network, one is concurrency, another is exactly powerful nonlinear information processing With learning ability.Currently, having there is the theoretical basis of multiple neural network models, working principle to reach mature level, make The application further studied them in multiple related fieldss such as signal processing, control system, speech recognition is obtained as hot spot.With it is soft Part simulation is compared, and has that processing speed is fast, high concurrency based on hard-wired neural network, it is easier to reach neural network reality When operation requirement.

When realizing neural network with FPGA, there are two difficult point: one be data representation, the other is nerve net The approach method of network activation primitive, this two o'clock determine the height of hardware resource utilization efficiency and the precision approached.Neural network Activation primitive there are many form, Sigmoid function is most popular a kind of excitation function in neural network, realizes difficulty Also maximum, it is the important link that neural network FPGA is realized.

Currently, the FPGA implementation method of sigmoid function has: direct look-up table, piecewise linear approximation, approximation by polynomi-als, Cordic algorithm, genetic algorithm etc..Wherein direct loop up table (Zhiliang Nie, 2012；Alexander Gomperts, 2010) it is to store sigmoid operation result in a storage module, according to input operand, directly searches and read as a result, this Method needs to expend a large amount of storage resource, and hardware realization precision is not high；Piecewise linear approximation method (Manish Panicker, 2012) in (- 5,5) range, using 3 segmentation piecewise linear approximations, number format, operation and storage resource are pinpointed using 32bit It is less, but operational precision is lower, and maximum mean square deviation is 0.00187；Cordic algorithm (Xi Chen, 2006) is calculated using CORDIC Method and lookup table algorithm combine, and data format is using the input of customized 16bit floating-point format and customized 32bit floating-point format Output, calculation resources are big, and operational precision is very low.Genetic algorithm (Bharat Kishore Bharkhada, 2004) is in [0,8] model In enclosing, integral coefficient segmental cubic polynomials are fitted using Gene hepatitis B vaccine, using 16 fixed point number formats, calculation resources are not Height, storage resource is lower, and operational precision is not high, absolute error 2.4376 × 10^-3；Polynomial approach algorithm is most commonly seen, tradition Taylor series expansion method, need to consume a large amount of calculation resources of consumption, and operational precision is very low.More classical piecewise parabolic Formula approximate algorithm (Joao O.P.Pinto, 2006) is using 5 rank multinomials of segmentation, and storage resource is low, and calculation resources are not high, operation Precision is higher, and worst error is 8 × 10^-5, this fitting precision is current all optimal precision that can reach in the prior art, But it is not still able to satisfy high-accuracy arithmetic requirement.

And in terms of the selection of data format, above all of method is that raising operational precision is mostly customized floating-point lattice Formula, and in real time high-speed process field, data format is often the 32bit single-precision floating point format of IEEE754 standard, it is this from Data format is defined when communicating with other processing modules, it is also necessary to consider the conversion of data format, communication cost is larger.It is dropping In terms of low consumption of resources, to reduce calculation resources consumption, loop up table is used, though operation result can be obtained, and is greatly reduced Calculation resources consumption, but significantly increases storage resource.It is comprehensive the problem of due to algorithm used itself in terms of operational precision The considerations of in terms of joint source, in current state of the art, hard-wired precision is not generally high, is far from satisfying high-precision in real time Spend the requirement of processing；These are all the bottleneck problems of urgent need to resolve.

Summary of the invention

The present invention be to avoid above-mentioned the deficiencies in the prior art in place of, propose a kind of based on column maze approximate algorithm Sigmoid Function Fitting hardware circuit, to improve operational precision on the basis of reducing hardware resource consumption, accelerate fortune It calculates speed and promotes the flexibility of operating structure.

The present invention adopts the following technical scheme that in order to solve the technical problem

A kind of the characteristics of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm of the invention is by following step It is rapid to carry out:

Step 1, basis given fitting precision u, calculation resources and storage resource, determine the order n of polynomial fitting；

Step 2, according to the fitting precision u, utilize formula (1) to obtain the fit interval [a, b] of sigmoid function f (x)；

The fit interval [a, b] is divided into using symmetry shown in formula (2) with origin 0 for symmetrical centre by step 3 2m+2 minizone [a, q₁],(q₁,q₂],…,(q_m,0],(0,q_m+1],…,(q_2m,b]；a,q₁,q₂,…,q_m,0,q_m+1,… q_2m, b respectively indicates the endpoint value of the 2m+2 minizone；q₁,q₂,…,q_m,q_m+1,…q_2mRespectively indicate the 2m cell Between scaling endpoint value；Extreme points set Q={ Q is successively constituted by the scaling endpoint value of the 2m minizone₀,Q₁,…,Q_t,… Q_2m-1}；Q_tIndicate the endpoint value of t-th of minizone in the scaling endpoint value of the 2m minizone；To obtain piecewise interval [Q₀,Q₁],[Q₁,Q₂],…,[Q_t,Q_t+1],…,[Q_2m-1,Q_2m]；T=0,1 ..., 2m-1；

F (- x)=1-f (x) (2)

Step 4, by the order n respectively with the section (0, b] on m sectored cells between m Vector Groups of composition [n, Q_m,Q_m+1],[n,Q_m+1,Q_m+2],…,[n,Q_ε,Q_ε+1],…,[n,Q_2m-1,Q_2m]；ε=m, m+1 ..., 2m-1, [n, Q_ε,Q_ε+1] table Show the ε Vector Groups；The m Vector Groups are successively substituted into Remes algorithm, to successively obtain the piecewise interval respectively Corresponding approximation accuracy u_m”,u_m+1”,…,u_t”,…u_2m-1"；

Step 4.1 obtains the ε Vector Groups [n, the Q using formula (5)_ε,Q_ε+1] corresponding to n+2 cut and to compare Xue Fuduo The intercrossing point group of item formulaWith the ε intercrossing point groupAs ε initial point sets To obtain m Vector Groups respectively corresponding to initial point set；

In formula (3), λ=0,1 ..., n+1；

Step 4.2 utilizes the ε initial point setsLinear side shown in solution formula (6) The solution of journey groupTo according to the solutionObtain that ε is initial to be forced Nearly multinomial

Step 4.3, in the ε piecewise interval [Q_ε,Q_ε+1] in obtain | f (x)-p_ε' (x) | when being maximum value it is corresponding from VariableBy the independent variableWithTo characterize；

IfAndThen useInstead of

IfAndThen useInstead of β=1,2 ..., n；To obtain the ε initial point setsUpdate point set；

Step 4.4 utilizes the ε initial point setsUpdate point set solve formula (6) institute The more new explanation of the system of linear equations shownTo according to the more new explanation Obtain the approximating polynomial of the ε update

Judgement | u_ε”-u_ε' | whether≤eps is true, if so, then with u_ε" it is used as the ε piecewise interval [Q_ε,Q_ε+1] Corresponding approximation accuracy；Otherwise, step 4.3- step 4.4 is repeated；Until | u_ε”-u_ε' | until≤eps is set up；Eps expression is forced Nearly error convergence controls precision；

Step 5 successively judges the approximation accuracy u_m”,u_m+1”,…,u_t”,…u_2m-1" whether meet the fitting precision U meets corresponding to approximation accuracy if satisfied, then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section The coefficient of approximating polynomial is the fitted polynomial coefficients that the fitting executes section；If not satisfied, then scaling described discontented Scaling endpoint value in piecewise interval corresponding to sufficient approximation accuracy, and return step 4 executes, and meets the fitting until obtaining The m fitting of precision u executes section and m group fitted polynomial coefficients；

If the independent variable x of step 6, the sigmoid function f (x) is interior at section (b ,+∞), then section (b ,+∞) conduct Fitting executes section；And the constant term coefficient of polynomial fitting corresponding to section (b ,+∞) is that 1, remaining each term coefficient is 0；To obtain m+1 n order polynomial fitting, the fitting of sigmoid function is completed；

The coefficient of the m+1 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7；

Step 8, according to the n order polynomial fitting, utilize n floating-point adder, 2n-1 floating-point multiplier and (n- 2) × k deposit unit designs multinomial operation module；And a floating-point is designed in the output end of the multinomial operation module Subtracter；K is the flowing water series of the floating-point adder, the floating-point multiplier and floating-point subtracter；

Step 9 executes block design judgment module according to the 2m+2 fitting；By the multinomial operation module, it is Number memory module, floating-point subtracter and judgment module constitute fitting hardware circuit；

The input value of one step 10, input operand ω as the fitting hardware circuit；And utilize the judgement mould Fitting where block judges the operand ω executes section；

If ω ∈ (0 ,+∞), then the fitting where reading the operand ω in the coefficient memory module executes area Between corresponding polynomial fitting coefficient；

If ω ∈ (- ∞, 0], then the fitting where reading the operand ω in the coefficient memory module executes area Between symmetric interval corresponding to polynomial fitting coefficient；

Step 12, the coefficient of polynomial fitting corresponding to the operand ω and the operand ω is read in it is described more It is fitted calculating in item formula computing module, if ω ∈ (0 ,+∞), then the fitting result obtained is the fitting hardware circuit Output valve；If ω ∈ (- ∞, 0], then the fitting result of acquisition and 1 are read in the floating-point subtracter, the calculating knot of acquisition Fruit is the output valve of the fitting hardware circuit.

Compared with currently existing technology, the invention has the advantages that:

1, the column maze approximate algorithm that the present invention uses can satisfy different design objective requirements, if design objective requires Very low calculation resources consumption and higher operational precision, can be appropriate to increase m's in the case where not changing fitting precision u Value increases the number of minizone, reduces the order n of polynomial fitting, and design is made to meet design objective requirement；If design objective It is required that lower storage resource consumption and higher operational precision, it can be appropriate to reduce in the case where not changing fitting precision u The value of m is to reduce the number of minizone, to reduce coefficient storage resource consumption, design is made to meet design objective requirement；Thus It is low to overcome fitting precision in currently existing technology, the big problem of resource consumption, so that polynomial fitting hardware circuit is being realized There is stronger flexibility during fitting of a polynomial.

2, present invention employs multinomial coefficient memory modules, and hardware circuit design is made to have stronger scalability, for Different fitting schemes need to only solidify the coefficient stored in memory module again.

3, present invention employs n floating-point adders, 2n-1 floating-point multiplier, and (n-2) × l deposit unit is utilized to post The intermediate result of operand and corresponding stage is deposited, so that this circuit is able to carry out the pipeline computing of single precision floating datum, is improved Arithmetic speed, so that design can satisfy the requirement of high speed real-time operation.

4, present invention employs judgment modules, and then loop up table and piecewise nonlinear approximatioss are combined, extension The execution section of fitting function, within the scope of entire real number any operand value can obtain corresponding operation result.

5, the present invention is according to the symmetry of sigmoid function, scheme two only need to it is described (0, b] section is using the calculation of column maze Method fitting, so as to which on the basis of not influencing operational precision, the resource consumption of coefficient memory module is reduced to original one Half, the number for solving the coefficient of polynomial fitting is reduced to original half.

6, the present invention increases by one in multinomial operation module-external and subtracts according to the symmetry of sigmoid function, scheme two Musical instruments used in a Buddhist or Taoist mass, to it is described (- ∞, a] section operand fitting result execute subtraction, can be in the base for not influencing operational precision On plinth, final result is fast and accurately obtained.

7, different data formats can be used in the present invention, can be real for the single-precision floating point formatted data of IEEE754 format Existing fitting precision is not less than 10^-6.For other customized floating-point format data, in the case of identical resource consumption, using this hair Bright circuit ratio can obtain higher fitting precision using other circuits.

Detailed description of the invention

Fig. 1 is the hardware circuit schematic diagram of the present invention program one；

Fig. 2 is the operation flow diagram of the present invention program one；

Fig. 3 is the multinomial operation circuit structure example implementation diagram of the present invention program one；

Fig. 4 is the hardware circuit schematic diagram of the present invention program two；

Fig. 5 is the operation flow diagram of the present invention program two；

Fig. 6 is the multinomial operation circuit structure example implementation diagram of the present invention program two.

Specific embodiment

In the present embodiment, a kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm is by following step It is rapid to carry out:

Step 2, according to fitting precision u, utilize formula (1) to obtain the fit interval [a, b] of sigmoid function f (x)；For example, In specific implementation, fitting precision u=10 is given^-6, the order n=5 of polynomial fitting；To the fit interval [a, b] obtained =[- 13.816,13.816]；

Fit interval [a, b] is divided into 2m+2 with origin 0 using symmetry shown in formula (2) for symmetrical centre by step 3 A minizone [a, q₁],(q₁,q₂],…,(q_m,0],(0,q_m+1],…,(q_2m,b]；a,q₁,q₂,…,q_m,0,q_m+1,…q_2m, b points Not Biao Shi 2m+2 minizone endpoint value；q₁,q₂,…,q_m,q_m+1,…q_2mRespectively indicate the scaling endpoint value of 2m minizone； Extreme points set Q={ Q is successively constituted by the scaling endpoint value of 2m minizone₀,Q₁,…,Q_t,…Q_2m-1}；Q_tIndicate 2m cell Between scaling endpoint value in t-th of minizone endpoint value；To obtain piecewise interval [Q₀,Q₁],[Q₁,Q₂],…,[Q_t, Q_t+1],…,[Q_2m-1,Q_2m]；T=0,1 ..., 2m-1；

In the present embodiment, take m=7, by fit interval [- 13.816,13.816] be divided into 14 minizones [- 13.816,-10],(-10,-8],(-8,-6],(-6,-4],(-4,-2],(-2,-1],(-1,0],(0,1],(1,2],(2, 4], (4,6], (6,8], (8,10], (10,13.816], to obtain 14 piecewise intervals successively are as follows: [- 13.816, -10], (- 10,-8],(-8,-6],(-6,-4],(-4,-2],(-2,-1],(-1,0],(0,1],(1,2],(2,4],(4,6],(6,8], (8,10],(10,13.816]；

F (- x)=1-f (x) (2)

The symmetry as shown in formula (2) is it is found that the fitting of sigmoid function f (x) can execute in entire fit interval Fitting, obtains fitting result, can also only do the fitting in the section x ∈ (0 ,+∞), and x ∈ (- ∞, 0] fitting result in section can be with Using the fitting result of formula (2) and its symmetric interval obtain, therefore can there are two types of scheme realize sigmoid function fitting, Wherein scheme one are as follows:

Order n is formed 2m Vector Groups [n, Q with 2m piecewise interval respectively by step 4₀,Q₁],[n,Q₁,Q₂],…, [n,Q_t,Q_t+1],…,[n,Q_2m,Q_2m+1]；[n,Q_t,Q_t+1] indicate t-th of Vector Groups；In the present embodiment, 14 Vector Groups are successively It is [5, -13.816, -10], [5, -10, -8], [5, -8, -6], [5, -6, -4], [5, -4, -2], [5, -2, -1], [5, -1, 0], [5,0,1], [5,1,2], [5,2,4], [5,4,6], [5,6,8], [5,8,10], [5,10,13.816], by 14 vectors Group successively substitute into Remes algorithm, thus successively obtain piecewise interval respectively corresponding to approximation accuracy u₀”,u₁”,…,u_t”,… u_2m+1"；

Step 4.1 obtains t-th Vector Groups [n, Q using formula (3)_t,Q_t+1] corresponding to n+2 cut than Xue's husband's multinomial Intercrossing point groupWith t-th of intercrossing point groupAs t-th of initial point setTo obtain The respective corresponding initial point set of 2m Vector Groups；

In formula (3), k=0,1 ..., n+1；

Step 4.2 utilizes t-th of initial point setSystem of linear equations shown in solution formula (4) SolutionTo according to solutionObtain t-th of initial approximating polynomial

Step 4.3, in t-th of piecewise interval [Q_t,Q_t+1] in obtain | f (x)-p_t' (x) | when being maximum value it is corresponding from VariableBy independent variableWithTo characterize；

IfAndThen useInstead of

IfAndThen useInstead ofTo obtain t-th of initial point setUpdate point set；

Step 4.4 utilizes t-th of initial point setUpdate point set solve formula (4) shown in line The more new explanation of property equation groupTo according to more new explanationIt obtains t-th The approximating polynomial of update

Judgement | u_t”-u_t' | whether≤eps is true, if so, then with u_t" it is used as t-th of piecewise interval [Q_t,Q_t+1] institute it is right The approximation accuracy answered；Otherwise, step 4.3- step 4.4 is repeated；Until | u_t”-u_t' | until≤eps is set up；Eps is approximate error Convergence control precision；

Step 5 successively judges approximation accuracy u₀”,u₁”,…,u_t”,…u_2m-1" whether meet fitting precision u, if satisfied, Then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section, meets approximating polynomial corresponding to approximation accuracy Coefficient be fitted execute section fitted polynomial coefficients；If not satisfied, then scaling is unsatisfactory for corresponding to approximation accuracy Scaling endpoint value in piecewise interval, and return step 4 executes, and the 2m+1 fitting execution of fitting precision u is met until obtaining Section and 2m+1 group fitted polynomial coefficients；

If the independent variable x of step 6, sigmoid function f (x) in section (b ,+∞), then section (b ,+∞) is as fitting Execute section；And it is 0 that the constant term coefficient of polynomial fitting corresponding to section (b ,+∞), which is 1, remaining each term coefficient,；If In section, (- ∞, a) interior, then (- ∞ a) executes section, and section as fitting to the independent variable x of sigmoid function f (x) in section (- ∞, a) corresponding to each term coefficient of polynomial fitting be 0；To obtain 2m+2 n order polynomial fitting, complete The fitting of sigmoid function；

In the present embodiment, the constant term coefficient of 5 rank polynomial fittings corresponding to section (13.816 ,+∞) is 1, section Remaining each term coefficient of 5 rank polynomial fittings corresponding to (13.816 ,+∞) is 0；Section (- ∞, -13.816) is corresponding Each term coefficients of 5 rank polynomial fittings be 0；

16 fittings that the present embodiment is obtained after step 5 and step 6 execute section are as follows: and (- ∞, -13.816), [- 13.816,-11],(-11,-7],(-7,5],(-5,-3],(-3,-2],(-2,-1],(-1,0],(0,1],(1,2],(2,3], (3,5], (5,7], (7,11], (11,13.816], (13.816 ,+∞) completes the fitting of sigmoid function.

The coefficient of 2m+2 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7；This reality It applies in example, 16 fittings is executed into the corresponding polynomial coefficient in section and are solidificated in ROM, and address is write according to storage rule Rule is read, Coefficient Look-up Table is constituted.

Step 8, according to n order polynomial fitting, using n floating-point adder, 2n-1 floating-point multiplier and (n-2) × K deposit unit designs multinomial operation module；K is the flowing water series of floating-point adder or floating-point multiplier；In the present embodiment, Multinomial operation module is designed using 5 floating-point adders, 9 floating-point multipliers and 6 reg deposit units, wherein floating-point is transported The flowing water series for calculating device is 2 grades.

Step 9 executes block design judgment module according to 2m+2 fitting；Mould is stored by multinomial operation module, coefficient Block and judgment module constitute fitting hardware circuit as shown in Figure 1；In Fig. 1, data_i is the source operand of input, and data_o is The operation result of output.

Shown in step 10, Fig. 2, input value of the operand ω as fitting hardware circuit is inputted；And it utilizes and judges mould Fitting where block judges operand ω executes section；

Step 11, from where read operands ω in coefficient memory module fitting execute section corresponding to fitting it is multinomial The coefficient of formula；

The coefficient of polynomial fitting corresponding to operand ω and operand ω is read in multinomial operation module by step 12 In be fitted calculating, thus obtain fitting result as fitting hardware circuit output valve.

Designed multinomial operation module out is as shown in figure 3, the IEEE754 standard list used in this embodiment scheme one Precision floating point data format, operational precision are not less than 10^-65 polynomial fitting hardware circuit implementation structure charts, including 9 multiply Musical instruments used in a Buddhist or Taoist mass and 5 adders and 6 reg deposit units；The multinomial realized is p (x)=Ax⁵+Bx⁴+Cx³+Dx²+ Ex+F, Result is the final output of operation as a result, concrete operation process is as follows:

Step a: source operand x enters multinomial operation module, reads coefficient E, and x enters multiplier Multi_1 and completes E*x Operation is simultaneously exported to next stage, and x enters multiplier Multi_2 and completes x²Operation is simultaneously exported to next stage, and it is temporary that x enters reg_1 Two-stage waits and participates in next stage operation, and 2 multipliers of the first order complete operation parallel, and it is 2 grades that multiplier flowing water series, which is all provided with,；

Step b: coefficient F and E*x are read and enters adder Add_1 completion x⁵Result is simultaneously output to next stage by operation, is read Take coefficient D and x²D*x is completed into multiplier Multi_3²Operation is simultaneously exported to next stage, x²Enter multiplier Multi_4 with x Complete x³Operation is simultaneously exported to next stage, and the x of upper level deposit enters reg_2 and continues temporary two-stage, waits and participates in next stage fortune It calculates, the floating point calculator of the second level 3 completes operation parallel, and flowing water series is disposed as 2 grades；

Step c: (E*x+F) and D*x are read²(Dx is completed into adder Add_2²+E*x²+ F) operation and export to next Grade reads coefficient C and x³C*x is completed into multiplier Multi_5³Operation is simultaneously exported to next stage, reads x³It is deposited with upper level X enter Multi_6 complete x⁴Operation is simultaneously exported to next stage, and the x of upper level deposit enters reg_3 and continues temporary two-stage, etc. Next stage operation to be participated in, 3 floating point calculators of the third level complete operation parallel, and flowing water series is disposed as 2 grades；

Step d: (Dx is read²+E*x²+ F) and C*x³(C*x is completed into adder Add_3³+Dx²+ Ex+F) operation and defeated Out to next stage, coefficient B and x are read⁴B*x is completed into multiplier Multi_7⁴Operation is simultaneously exported to next stage, reads x⁴With it is upper The x of level-one deposit enters multiplier Multi_8 and completes x⁵Operation is simultaneously exported to next stage, and 3 floating point calculators of the fourth stage are parallel Operation is completed, flowing water series is disposed as 2 grades；

Step e: (C*x is read³+Dx²+ Ex+F) and B*x⁴(B*x is completed into adder Add_4⁴+C*x³+D*x²+E*x+ F it) operation and exports to next stage, reads coefficient A and x⁵A*x is completed into multiplier Multi_9⁵Operation is simultaneously exported to next stage, 2 floating point calculators of level V complete operation parallel, and flowing water series is disposed as 2 grades；

Step f: adder Add_5 completes (A*x⁵+B*x⁴+C*x³+D*x²+ E*x+F) it operation and exports, the stream of adder Water series is set as 2 grades；Operation result is final result, is directly exported；

More than completion after each step, the processing of the sigmoid Function Fitting in the present invention is just completed.It is each to count this example The clock periodicity of a step, every grade of operation flowing water series are 2, and totally 6 grades, the fitting operation for completing single source operand needs 13 A clock cycle, fitting precision are not less than 10^-6, maximum mean square deviation is no more than 8.74 × 10^-14.The fitting precision is much higher than current Optimal fitting precision in the prior art, resource consumption is lower, and data format is IEEE754 single-precision floating point format, Neng Gougeng Good is applied in high-precision high-speed real-time operation.

Scheme one uses less floating-point operation resource and less floating-point operation series, thus arithmetic speed is faster, but Coefficient memory module will store more fitted polynomial coefficients, increase storage resource.In addition, though entire sigmoid letter Several fitting precisions is all very high, but due to using different polynomial fittings at left and right sides of origin, about origin symmetry The corresponding fitting precision of two fit intervals will be different.

Scheme two: step 4- step 12 can also carry out as follows:

Step 4, by order n respectively with section (0, b] on m sectored cells between form m Vector Groups [n, Q_m,Q_m+1], [n,Q_m+1,Q_m+2],…,[n,Q_ε,Q_ε+1],…,[n,Q_2m-1,Q_2m]；ε=m, m+1 ..., 2m-1, [n, Q_ε,Q_ε+1] indicate ε Vector Groups；M Vector Groups are successively substituted into Remes algorithm, thus successively obtain piecewise interval respectively corresponding to approximation accuracy u_m”,u_m+1”,…,u_t”,…u_2m-1"；

Step 4.1 obtains the ε Vector Groups [n, Q using formula (5)_ε,Q_ε+1] corresponding to n+2 cut than Xue's husband's multinomial Intercrossing point groupWith the ε intercrossing point groupAs ε initial point setsTo obtain m The respective corresponding initial point set of a Vector Groups；

In formula (3), λ=0,1 ..., n+1；

Step 4.2 utilizes ε initial point setsSystem of linear equations shown in solution formula (6) SolutionTo according to solutionObtain ε initial approximating polynomials

Step 4.3, in the ε piecewise interval [Q_ε,Q_ε+1] in obtain | f (x)-p_ε' (x) | when being maximum value it is corresponding from VariableBy independent variableWithTo characterize；

IfAndThen useInstead of

IfAndThen useInstead of β=1,2 ..., n；To obtain ε initial point setsUpdate point set；

Step 4.4 utilizes ε initial point setsUpdate point set solve formula (6) shown in The more new explanation of system of linear equationsTo according to more new explanationIt obtains The approximating polynomial of the ε update

Judgement | u_ε”-u_ε' | whether≤eps is true, if so, then with u_ε" it is used as the ε piecewise interval [Q_ε,Q_ε+1] institute it is right The approximation accuracy answered；Otherwise, step 4.3- step 4.4 is repeated；Until | u_ε”-u_ε' | until≤eps is set up；Eps expression approaches mistake Difference convergence control precision.

Step 5 successively judges approximation accuracy u_m”,u_m+1”,…,u_t”,…u_2m-1" whether meet fitting precision u, if satisfied, Then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section, meets approximating polynomial corresponding to approximation accuracy Coefficient be fitted execute section fitted polynomial coefficients；If not satisfied, then scaling is unsatisfactory for corresponding to approximation accuracy Scaling endpoint value in piecewise interval, and return step 4 executes, and the m fitting execution section of fitting precision u is met until obtaining With m group fitted polynomial coefficients；

If the independent variable x of step 6, sigmoid function f (x) in section (b ,+∞), then section (b ,+∞) is as fitting Execute section；And it is 0 that the constant term coefficient of polynomial fitting corresponding to section (b ,+∞), which is 1, remaining each term coefficient,；From And m+1 n order polynomial fitting is obtained, complete the fitting of sigmoid function；

In the present embodiment, the constant term coefficient of 5 rank polynomial fittings corresponding to section (13.816 ,+∞) is 1, section Remaining each term coefficient of 5 rank polynomial fittings corresponding to (13.816 ,+∞) is 0；

By step 5 and step 6, obtain this example implement 8 fittings execution sections (0,1], (1,2], (2,3], (3, 5],(5,7],(7,11],(11,13.816],(13.816,+∞)；To complete the fitting of sigmoid function.

The coefficient of m+1 n order polynomial fitting is solidificated in ROM, the efficiency of formation memory module by step 7；This implementation In example, 8 fittings are executed into the corresponding polynomial coefficient in section and are solidificated in ROM, and address is write according to storage rule and is read Rule is taken, Coefficient Look-up Table is constituted.

Step 8, according to n order polynomial fitting, using n floating-point adder, 2n-1 floating-point multiplier and (n-2) × K deposit unit designs multinomial operation module；And a floating-point subtracter is designed in the output end of multinomial operation module；k For the flowing water series of floating-point adder, floating-point multiplier and floating-point subtracter；In the present embodiment, using 5 floating-point adders, 9 A floating-point multiplier and 6 reg deposit units design multinomial operation module, and wherein the flowing water series of floating point calculator is 2 Grade.

Step 9 executes block design judgment module according to 2m+2 fitting；Mould is stored by multinomial operation module, coefficient Block, floating-point subtracter and judgment module constitute fitting hardware circuit as shown in Figure 4；In Fig. 4, data_i is the source operation of input Number, data_o are the operation results of output.

Shown in step 10, Fig. 5, input value of the operand ω as fitting hardware circuit is inputted；And it utilizes and judges mould Fitting where block judges operand ω executes section；

If ω ∈ (0 ,+∞), then executed corresponding to section from the fitting where read operands ω in coefficient memory module Polynomial fitting coefficient；If ω ∈ (- ∞, 0], then it is executed from the fitting where read operands ω in coefficient memory module The coefficient of polynomial fitting corresponding to the symmetric interval in section；

The coefficient of polynomial fitting corresponding to operand ω and operand ω is read in multinomial operation module by step 12 In be fitted calculating, if ω ∈ (0 ,+∞), then the fitting result obtained be fitted hardware circuit output valve；If ω ∈ (- ∞, 0], then the fitting result of acquisition and 1 are read in floating-point subtracter, the calculated result of acquisition is to be fitted hardware circuit Output valve.

Designed multinomial operation module out is as shown in fig. 6, the IEEE754 standard list used in this embodiment scheme two Precision floating point data format, operational precision are not less than 10^-65 polynomial fitting hardware circuit implementation structure charts, including 9 multiply Musical instruments used in a Buddhist or Taoist mass and 5 adders and 6 reg deposit units.The multinomial realized is p (x)=Ax⁵+Bx⁴+Cx³+Dx²+ Ex+F, Result is the final output of operation as a result, concrete operation process is as follows:

Step f: adder Add_5 completes (A*x⁵+B*x⁴+C*x³+D*x²+ E*x+F) it operation and exports, the stream of adder Water series is set as 2 grades；

Step g: if source operand is on section (0 ,+∞), then upper level operation result is final result, directly defeated Out；If source operand on section (- ∞, 0), then does subtraction operation with upper level operation result for 1 using subtracter Add_6, Operation result is final result, is directly exported, and the flowing water series of subtracter is set as 2 grades.

More than completion after each step, the processing of the sigmoid Function Fitting in the present invention is just completed.It is each to count this example The clock periodicity of a step, every grade of operation flowing water series are 2, and totally 7 grades, the fitting operation for completing single source operand needs 15 A clock cycle, fitting precision are not less than 10^-6, maximum mean square deviation is no more than 8.74 × 10^-14, maximum mean square deviation is no more than 8.74 ×10^-14.The fitting precision is much higher than optimal fitting precision in currently existing technology, and resource consumption is lower, and data format is IEEE754 single-precision floating point format can preferably be applied in high-precision high-speed real-time operation.

Two coefficient memory module of scheme stores less fitted polynomial coefficients, reduces storage resource consumption, and reduce Digital simulation polynomial workload.Due to using identical polynomial fitting at left and right sides of origin, about origin The corresponding fitting precision of symmetrical two fit intervals is identical, is more convenient for doing error analysis.Although entire sigmoid function is quasi- The requirement that arithmetic speed meets real time high-speed operation is closed, but due to increasing a subtracter and operation series, thus increase Calculation resources consumption, reduce arithmetic speed.

To sum up, the present invention utilizes column maze approximate algorithm, can quickly and effectively complete sigmoid functional operation, realizes The fitting operation of degree of precision, so that single-precision floating point operation for IEEE754 standard, in the requirement of high-precision hardware realization Lower worst error is no more than 10^-6, and for non-IEEE754 standard data, equivalent technology can also be obtained using this structure The more currently existing better fitting precision of technology under index request.This method circuit structure is simple, and scale is limited, and use is fewer Operation can be completed in the adder and multiplier of amount, greatly reduces calculation resources consumption, and flexibility is higher, guarantee operation high speed and While concurrency requires, the precision and performance of sigmoid Function Fitting operation are effectively improved, solves currently existing skill The bottleneck problem that art faces.

Claims

1. a kind of sigmoid Function Fitting hardware circuit based on column maze approximate algorithm, it is characterized in that carrying out as follows:

The fit interval [a, b] is divided into 2m+2 with origin 0 using symmetry shown in formula (2) for symmetrical centre by step 3 A minizone [a, q₁],(q₁,q₂],…,(q_m,0],(0,q_m+1],…,(q_2m,b]；a,q₁,q₂,…,q_m,0,q_m+1,…q_2m, b points The endpoint value of the 2m+2 minizone is not indicated；q₁,q₂,…,q_m,q_m+1,…q_2mRespectively indicate the scaling end of 2m minizone Point value；Extreme points set Q={ Q is successively constituted by the scaling endpoint value of the 2m minizone₀,Q₁,…,Q_t,…Q_2m-1}；Q_tIt indicates The endpoint value of t-th of minizone in the scaling endpoint value of the 2m minizone；To obtain piecewise interval [Q₀,Q₁],[Q₁, Q₂],…,[Q_t,Q_t+1],…,[Q_2m-1,Q_2m]；T=0,1 ..., 2m-1；

F (- x)=1-f (x) (2)

Step 4, by the order n respectively with the section (0, b] on m sectored cells between form m Vector Groups [n, Q_m, Q_m+1],[n,Q_m+1,Q_m+2],…,[n,Q_ε,Q_ε+1],…,[n,Q_2m-1,Q_2m]；ε=m, m+1 ..., 2m-1, [n, Q_ε,Q_ε+1] indicate The ε Vector Groups；The m Vector Groups are successively substituted into Remes algorithm, to successively obtain the piecewise interval respectively institute Corresponding approximation accuracy u_m”,u_m+1”,…,u_t”,…u_2m-1"；

Step 4.1 obtains the ε Vector Groups [n, the Q using formula (5)_ε,Q_ε+1] corresponding to n+2 cut than Xue's husband's multinomial Intercrossing point groupWith the ε intercrossing point groupAs ε initial point setsTo Obtain m Vector Groups respectively corresponding to initial point set；

In formula (3), λ=0,1 ..., n+1；

Step 4.2 utilizes the ε initial point setsSystem of linear equations shown in solution formula (6) SolutionTo according to the solutionObtain ε initial approximating polynomials

Step 4.3, in the ε piecewise interval [Q_ε,Q_ε+1] in obtain | f (x)-p_ε' (x) | corresponding independent variable when being maximum valueBy the independent variableWithTo characterize；

IfAndThen useInstead of

IfAndThen useInstead ofβ= 1,2,…,n；To obtain the ε initial point setsUpdate point set；

Step 4.4 utilizes the ε initial point setsUpdate point set solve formula (6) shown in The more new explanation of system of linear equationsTo according to the more new explanation Obtain the approximating polynomial of e-th of update

Judgement | u_ε”-u_ε' | whether≤eps is true, if so, then with u_ε" it is used as e-th of piecewise interval [Q_ε,Q_ε+1] institute it is right The approximation accuracy answered；Otherwise, step 4.3- step 4.4 is repeated；Until | u_ε”-u_ε' | until≤eps is set up；Eps expression approaches mistake Difference convergence control precision；

Step 5 successively judges the approximation accuracy u_m”,u_m+1”,…,u_t”,…u_2m-1" whether meet the fitting precision u, if Meet, then meeting piecewise interval corresponding to approximation accuracy is to be fitted to execute section, meets and approaches corresponding to approximation accuracy Polynomial coefficient is the fitted polynomial coefficients that the fitting executes section；If not satisfied, then being unsatisfactory for forcing described in scaling Scaling endpoint value in piecewise interval corresponding to nearly precision, and return step 4 executes, and meets the fitting precision until obtaining The m fitting of u executes section and m group fitted polynomial coefficients；

If the independent variable x of step 6, the sigmoid function f (x) is interior at section (b ,+∞), then section (b ,+∞) is as fitting Execute section；And it is 0 that the constant term coefficient of polynomial fitting corresponding to section (b ,+∞), which is 1, remaining each term coefficient,；From And m+1 n order polynomial fitting is obtained, complete the fitting of sigmoid function；

Step 8, according to the n order polynomial fitting, using n floating-point adder, 2n-1 floating-point multiplier and (n-2) × K deposit unit designs multinomial operation module；And a floating-point subtraction is designed in the output end of the multinomial operation module Device；K is the flowing water series of the floating-point adder, the floating-point multiplier and floating-point subtracter；

Step 9 executes block design judgment module according to the 2m+2 fitting；It is deposited by the multinomial operation module, coefficient It stores up module, floating-point subtracter and judgment module and constitutes fitting hardware circuit；

The input value of one step 10, input operand ω as the fitting hardware circuit；And sentenced using the judgment module The fitting broken where the operand ω executes section；

If ω ∈ (0 ,+∞), then the fitting where reading the operand ω in the coefficient memory module executes section institute The coefficient of corresponding polynomial fitting；

If ω ∈ (- ∞, 0], then the fitting where reading the operand ω in the coefficient memory module executes section The coefficient of polynomial fitting corresponding to symmetric interval；

The coefficient of polynomial fitting corresponding to the operand ω and the operand ω is read in the multinomial by step 12 Calculating is fitted in computing module, if ω ∈ (0 ,+∞), then the fitting result obtained is the defeated of the fitting hardware circuit It is worth out；If ω ∈ (- ∞, 0], then the fitting result of acquisition and 1 are read in the floating-point subtracter, the calculated result of acquisition is For the output valve of the fitting hardware circuit.