CN116578338A - Low-delay trigonometric function hardware acceleration algorithm - Google Patents

Low-delay trigonometric function hardware acceleration algorithm Download PDF

Info

Publication number
CN116578338A
CN116578338A CN202211579750.XA CN202211579750A CN116578338A CN 116578338 A CN116578338 A CN 116578338A CN 202211579750 A CN202211579750 A CN 202211579750A CN 116578338 A CN116578338 A CN 116578338A
Authority
CN
China
Prior art keywords
iteration
value
trigonometric function
low
hardware acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211579750.XA
Other languages
Chinese (zh)
Inventor
周柯
奉斌
金庆忍
俞小勇
王晓明
卢柏桦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Guangxi Power Grid Co Ltd filed Critical Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority to CN202211579750.XA priority Critical patent/CN116578338A/en
Publication of CN116578338A publication Critical patent/CN116578338A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a low-delay trigonometric function hardware acceleration algorithm, which comprises the following steps: θ using front-end modules 0 The input angle is converted into a first quadrant; determining the calculated initial value (x 0 ,y 0 ,z 0 ) For sine and cosine calculations, the given initial value is (1/K n 0, θ); the iteration module is used for obtaining an iteration value (x) by adopting a secondary merging iteration method N ,y N ) The method comprises the steps of carrying out a first treatment on the surface of the And obtaining the value of the trigonometric function by using a post-processing module. The invention adopts a method of twice merging iteration during iterative operation, can reduce the hardware consumption of a hardware module, improves the calculation speed, adopts a method of combining table lookup with approximate substitution to calculate the arctangent algorithm during calculating the arctangent angle, improves the calculation speed and reduces the consumption of ROM resources.

Description

Low-delay trigonometric function hardware acceleration algorithm
Technical Field
The invention relates to the technical field of hardware acceleration algorithms, in particular to a low-delay trigonometric function hardware acceleration algorithm.
Background
The novel power distribution network taking new energy as a main body and taking digitization and intellectualization as characteristics needs a large amount of information acquisition and operation analysis so as to ensure safe, stable and reliable power supply of the power grid. The electric energy quality is used as an important standard for measuring the power supply level, and is an important object for collection and analysis in the novel power distribution network. However, the terminal side of the power distribution network has serious defects in power quality acquisition and analysis at present, and huge calculation pressure is brought to the edge side of the power grid. Therefore, it is necessary to provide a power chip with power quality processing capability to relieve edge side calculation pressure and promote grid edge fusion.
The processing of the electric energy quality involves a large amount of trigonometric function operation, the current power chip mostly adopts a table look-up or Taylor expansion mode to calculate the trigonometric function, the table look-up method needs to occupy a large amount of memory and has limited precision, and the Taylor expansion calculation method has large operation amount, low operation speed, needs to occupy a large amount of CPU resources and has large power consumption. Therefore, there is a need to use a method of calculating trigonometric functions in a chip form to improve the calculation capability of the power chip and reduce the power consumption.
Disclosure of Invention
The low-delay trigonometric function hardware acceleration algorithm solves the problems of low operation speed, high occupied memory and high hardware consumption caused by large operation amount in the existing trigonometric function operation method in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a low-latency trigonometric function hardware acceleration algorithm, comprising the steps of:
s1: θ using front-end modules 0 The input angle is converted into a first quadrant;
s2: determining the calculated initial value (x 0 ,y 0 ,z 0 ) For sine and cosine calculations, the given initial value is (1/K n ,0,θ);
S3: the iteration is obtained by adopting a method of twice merging iteration through an iteration moduleValue (x) N ,y N );
S4: and obtaining the value of the trigonometric function by using a post-processing module.
Preferably, the step S3 includes the steps of:
s301: input iterative calculation initial value (x 0 ,y 0 ,z 0 );
S302: judging whether the iteration number i reaches the maximum iteration number N, if so, outputting an iteration result, and if not, continuing to execute the steps S303-S307;
s303: calculating tan by combining table look-up and approximate substitution -1 2 -i
S304: according to the following, the iteration directions di, di+1 and the iteration value z of the ith step and the (i+1) th step are calculated i 、z i+1
S305: the (i+1) th iteration value x is calculated according to the following formula i+1 ,y i+1
S306: updating the iteration times i;
s307: returning to step S302.
Preferably, the pre-module in step S1 performs the following operations: when 0 is less than or equal to theta 0 Pi/2 or less, and the converted angle theta is theta 0 The method comprises the steps of carrying out a first treatment on the surface of the When pi/2<θ 0 Pi is less than or equal to pi, and the converted angle theta is theta 0 -pi/2; when pi is<θ 0 Less than or equal to 3 pi/2, and the converted angle theta is theta 0 -pi; when 3 pi/2<θ 0 Less than or equal to 2 pi, and the converted angle theta is theta 0 -3π/2。
Preferably, the operation formula of the iteration module in the step S3 is as follows:
wherein d i =sign(z i ),d i Is the iteration direction.
Preferably, the post-processing module in step S4 includes the following algorithm:
preferably, the tan -1 2 -i The method comprises the following steps:
s01: tan (r) -1 2 -i Values of (i=1, 2,3 … m) are pre-stored in ROM;
s02: judging the size relation between i and m, executing the step S03 when i is smaller than m, and executing the step S04 when i is larger than or equal to m;
s03: obtaining the value of tan-12-i by looking up a table in ROM;
s04: the tan-12-i has a value of 2-i.
Preferably, the value of m is as follows:
wherein N is the set maximum iteration number.
Preferably, the steps S1 to S4 are implemented by a hardware curing circuit.
Preferably, the maximum iteration number N has a value of 16.
Preferably, the maximum iteration number N has a value of 32.
The beneficial effects of the invention are as follows:
the invention adopts a method of twice merging iteration during iterative operation, can reduce the hardware consumption of a hardware module, improves the calculation speed, adopts a method of combining table lookup with approximate substitution to calculate the arctangent algorithm during calculating the arctangent angle, improves the calculation speed and reduces the consumption of ROM resources.
Drawings
FIG. 1 is a block diagram of a trigonometric function low-delay hardware acceleration algorithm of the present invention;
FIG. 2 is a flow chart of an iterative algorithm module of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a low-delay trigonometric function hardware acceleration algorithm, which is described by combining with FIG. 1, and comprises the following steps:
s1: θ using front-end modules 0 The input angle is converted into a first quadrant;
s2: determining the calculated initial value (x 0 ,y 0 ,z 0 ) For sine and cosine calculations, the given initial value is (1/K n ,0,θ);
S3: the iteration module is used for obtaining an iteration value (x) by adopting a secondary merging iteration method N ,y N );
S4: and obtaining the value of the trigonometric function by using a post-processing module.
The process and principle of a low-delay trigonometric function hardware acceleration algorithm of the invention are described below in conjunction with the four steps described above.
Step S1: θ using front-end modules 0 The input angle is converted into a first quadrant;
wherein, the front module performs the following operation according to the input theta 0 Different conditions of the converted angle are obtained under different values of the angle, so that the converted angle theta is ensured to be in the first quadrant, and the method is as follows: when 0 is less than or equal to theta 0 Pi/2 or less, and the converted angle theta is theta 0 The method comprises the steps of carrying out a first treatment on the surface of the When pi/2<θ 0 Pi is less than or equal to pi, and the converted angle theta is theta 0 -pi/2; when pi is<θ 0 Less than or equal to 3 pi/2, and the converted angle theta is theta 0 -pi; when 3 pi/2<θ 0 Less than or equal to 2 pi, and the converted angle theta is theta 0 -3 pi/2. To sum up, it is possible to realize θ according to the input 0 Different conditions of the converted angle are obtained under different values of the angle, and the converted angle theta is ensured to be in the first quadrant.
Step S2: determining the calculated initial value (x 0 ,y 0 ,z 0 ) For sine and cosine calculations, the given initial value is (1/K n ,0,θ);
Wherein K is n Is a fixed value, takes a value of 0.607, since after n iterations (n is greater), kn is approximately 0.607, so takes K n Is a fixed value of 0.607.
Step S3: the iteration module is used for obtaining an iteration value (x) by adopting a secondary merging iteration method N ,y N );
The method of twice merging iteration is adopted because the twice merging merges the two iterations into one time, and the calculation speed of the method of twice iteration is faster compared with that of the general iteration method because the general iteration method carries out one iteration each time. The algorithm of the iteration module can be summarized by the following formula, as follows:
wherein d i =sign(z i ),d i Is the iteration direction.
Fig. 2 shows the operation steps of the iteration module, which can be expressed as the following steps:
s301: input iterative calculation initial value (x 0 ,y 0 ,z 0 );
S302: judging whether the iteration number i reaches the maximum iteration number N, if so, outputting an iteration result, and if not, continuing to execute the steps S303-S307;
the maximum iteration number N is 16 or 32, which is the official value in the industry, and the maximum iteration number of the output result can be obtained accurately under the condition of the least iteration number.
S303: calculating tan by combining table look-up and approximate substitution -1 2 -i
S304: according to the following, the iteration directions di, di+1 and the iteration value z of the ith step and the (i+1) th step are calculated i 、z i+1
S305: the (i+1) th iteration value x is calculated according to the following formula i+1 ,y i+1
S306: updating the iteration times i;
s307: returning to step S302.
Wherein, the arctangent tan is described in step S303 -1 2 -i The method is obtained by combining storage and approximate substitution, and is characterized in that when calculating the arc tangent angle by a single table look-up method, all the reverse switching values are required to be calculated in advance and stored in the ROM in advance, and a large memory is required to be consumed; the single approximate substitution method replaces the arctangent value at the sampling angle value, but at the initial stage of calculation, but at the early stage of the iterative algorithm, the angle value is larger, and the approximate substitution error is larger; the method combining the table look-up and the approximate substitution adopts the table look-up to calculate the arctangent value in the early stage of iteration, and adopts the approximate substitution method to calculate the arctangent value in the later stage of iteration, so that the larger memory consumption can be reduced, and the calculation accuracy can be increased.
Said arctangent tan -1 2 -i The steps obtained by the method combining storage and approximate substitution are as follows:
s01: tan (r) -1 2 -i Values of (i=1, 2,3 … m) are pre-stored in ROM;
s02: judging the size relation between i and m, executing the step S03 when i is smaller than m, and executing the step S04 when i is larger than or equal to m;
s03: obtaining the value of tan-12-i by looking up a table in ROM;
s04: the tan-12-i has a value of 2-i.
Step S4: and obtaining the value of the trigonometric function by using a post-processing module.
The post-processing module in step S4 includes the following algorithm:
in summary, the invention utilizes the hardware circuit to realize the calculation of the trigonometric function, thereby greatly improving the operation speed of the algorithm and reducing the calculation pressure and the power consumption of the CPU. Compared with other calculation methods, the algorithm adopts a secondary merging iteration method during iterative operation, so that the hardware consumption of a hardware module can be reduced, the calculation speed is improved, and meanwhile, when the arc tangent angle is calculated, the arc tangent algorithm is calculated by adopting a method combining table lookup with approximate substitution, the calculation speed is improved, and the consumption of ROM resources is reduced.
The method has the advantages that the calculation speed can be increased by adopting the method of the secondary iteration, the consumption of hardware can be reduced due to the reduction of the iteration times, the secondary iteration is combined into one time by the secondary iteration, and compared with the common iteration method, the calculation speed of the method of the secondary iteration is faster and the effect is better, so that compared with the prior art, the method has the advantages of high calculation speed and less hardware consumption.
Secondly, the invention adopts a method combining table look-up and approximate substitution to calculate the arc tangent algorithm, because when calculating the arc tangent angle, a single table look-up method needs to calculate all the inverse and positive switching values in advance and store the calculated values in ROM in advance, and a large memory is needed to be consumed; the single approximate substitution method replaces the arctangent value at the sampling angle value, but at the initial stage of calculation, but at the early stage of the iterative algorithm, the angle value is larger, and the approximate substitution error is larger; therefore, no matter a single table look-up method or a single approximate substitution method is adopted, the method for combining the table look-up and the approximate substitution is adopted to calculate the arc tangent algorithm, the method for combining the table look-up and the approximate substitution is adopted to calculate the arc tangent value in the early stage of iteration, and the approximate substitution method is adopted to calculate the arc tangent value in the later stage of iteration, so that the method can reduce larger memory consumption and increase the accuracy of calculation.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims (10)

1. A low-latency trigonometric function hardware acceleration algorithm, comprising the steps of:
s1: θ using front-end modules 0 The input angle is converted into a first quadrant;
s2: determining the calculated initial value (x 0 ,y 0 ,z 0 ) For sine and cosine calculations, the given initial value is (1/K n ,0,θ);
S3: the iteration module is used for obtaining an iteration value (x) by adopting a secondary merging iteration method N ,y N );
S4: and obtaining the value of the trigonometric function by using a post-processing module.
2. The low-latency trigonometric function hardware acceleration algorithm according to claim 1, characterized in that said step S3 comprises the steps of:
s301: input iterative calculation initial value (x 0 ,y 0 ,z 0 );
S302: judging whether the iteration number i reaches the maximum iteration number N, if so, outputting an iteration result, and if not, continuing to execute the steps S303-S307;
s303: calculation by adopting a method combining table look-up and approximate substitutionTan of arctangent -1 2 -i
S304: according to the following, the iteration directions di, di+1 and the iteration value z of the ith step and the (i+1) th step are calculated i 、z i+1
S305: the (i+1) th iteration value x is calculated according to the following formula i+1 ,y i+1
S306: updating the iteration times i;
s307: returning to step S302.
3. The low-latency trigonometric function hardware acceleration algorithm according to claim 1, wherein the pre-module in step S1 performs the following operations: when 0 is less than or equal to theta 0 Pi/2 or less, and the converted angle theta is theta 0 The method comprises the steps of carrying out a first treatment on the surface of the When pi/2<θ 0 Pi is less than or equal to pi, and the converted angle theta is theta 0 -pi/2; when pi is<θ 0 Less than or equal to 3 pi/2, and the converted angle theta is theta 0 -pi; when 3 pi/2<θ 0 Less than or equal to 2 pi, and the converted angle theta is theta 0 -3π/2。
4. The low-latency trigonometric function hardware acceleration algorithm according to claim 1, wherein the operation formula of the iterative module in step S3 is as follows:
wherein d i =sign(z i ),d i Is the iteration direction.
5. A low-latency trigonometric function hardware acceleration algorithm according to claim 3, wherein the post-processing module of step S4 includes the following algorithm:
6. a low-latency trigonometric function hardware acceleration algorithm according to claim 2 or 4, characterized in that the tan -1 2 -i The method comprises the following steps:
s01: tan (r) -1 2 -i Values of (i=1, 2,3 … m) are pre-stored in ROM;
s02: judging the size relation between i and m, executing the step S03 when i is smaller than m, and executing the step S04 when i is larger than or equal to m;
s03: obtaining tan by looking up a table in ROM -1 2 -i Is a value of (2);
S04:tan -1 2 -i has a value of 2 -i
7. The low-latency trigonometric function hardware acceleration algorithm according to claim 6, wherein the value of m is as follows:
wherein N is the set maximum iteration number.
8. The low-latency trigonometric function hardware acceleration algorithm according to claim 1, wherein steps S1-S4 are implemented with hardware cure circuitry.
9. A low-latency trigonometric function hardware acceleration algorithm according to claim 2, characterized in that: the maximum iteration number N takes a value of 16.
10. A low-latency trigonometric function hardware acceleration algorithm according to claim 2, characterized in that: the maximum iteration number N takes a value of 32.
CN202211579750.XA 2022-12-06 2022-12-06 Low-delay trigonometric function hardware acceleration algorithm Pending CN116578338A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211579750.XA CN116578338A (en) 2022-12-06 2022-12-06 Low-delay trigonometric function hardware acceleration algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211579750.XA CN116578338A (en) 2022-12-06 2022-12-06 Low-delay trigonometric function hardware acceleration algorithm

Publications (1)

Publication Number Publication Date
CN116578338A true CN116578338A (en) 2023-08-11

Family

ID=87544097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211579750.XA Pending CN116578338A (en) 2022-12-06 2022-12-06 Low-delay trigonometric function hardware acceleration algorithm

Country Status (1)

Country Link
CN (1) CN116578338A (en)

Similar Documents

Publication Publication Date Title
EP3674883B1 (en) Multiplication circuit, system on chip, and electronic device
CN111581593A (en) Configurable reuse sectional type lookup table activation function implementation device
WO2021057085A1 (en) Hybrid precision storage-based depth neural network accelerator
CN107918292B (en) Exponential integration-oriented power electronic circuit transient simulation GPU (graphics processing Unit) acceleration method
CN111813371B (en) Floating point division operation method, system and readable medium for digital signal processing
CN102566965B (en) Floating-point number logarithmic operation device with flat errors
US8549056B2 (en) Apparatus and program for arctangent calculation
CN102567254B (en) The method that adopts dma controller to carry out data normalization processing
CN116578338A (en) Low-delay trigonometric function hardware acceleration algorithm
CN104714773A (en) Embedded rotation angle calculation IP soft core based on PLB bus and rotation angle calculation method
CN111984057B (en) GPU-based digital NCO high-precision parallel implementation method
CN112734023A (en) Reconfigurable circuit applied to activation function of recurrent neural network
CN114996638A (en) Configurable fast Fourier transform circuit with sequential architecture
CN105939160B (en) Low memory capacity Turbo code decoder and design method in LTE-Advanced standard
CN104683817B (en) Parallel transformation and inverse transform method based on AVS
CN108319804B (en) 8192 point base 2 DIT ASIC design method for low resource call
CN112968473A (en) AC-DC hybrid power distribution network robust state estimation method and terminal equipment
CN102832951B (en) Realizing method for LDPC (Low Density Parity Check) coding formula based on probability calculation
US20150178047A1 (en) Method of fast arctangent calculation pre and post processing
CN111984056A (en) GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method
CN111695080B (en) Power grid state estimation method of GPU parallel acceleration preprocessing conjugate gradient iteration method
CN109687870A (en) The SARADC capacitance mismatch bearing calibration of charge redistribution type and system
CN109214059B (en) Graphic processor memory management method for power electronic efficient transient simulation
CN116679988B (en) Hardware acceleration unit, hardware acceleration method, chip and storage medium
CN109271250B (en) Graphic processor memory management method oriented to power electronic transient simulation acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination