CN113778379A - CORDIC-based low-complexity hardware system and application method - Google Patents

CORDIC-based low-complexity hardware system and application method Download PDF

Info

Publication number
CN113778379A
CN113778379A CN202111135783.0A CN202111135783A CN113778379A CN 113778379 A CN113778379 A CN 113778379A CN 202111135783 A CN202111135783 A CN 202111135783A CN 113778379 A CN113778379 A CN 113778379A
Authority
CN
China
Prior art keywords
cordic
module
processing unit
complex
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111135783.0A
Other languages
Chinese (zh)
Inventor
李丽
徐瑾
傅玉祥
陈辉
蒋林
武瑞琪
何书专
陈健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Ningqi Intelligent Computing Chip Research Institute Co ltd
Original Assignee
Nanjing Ningqi Intelligent Computing Chip Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Ningqi Intelligent Computing Chip Research Institute Co ltd filed Critical Nanjing Ningqi Intelligent Computing Chip Research Institute Co ltd
Priority to CN202111135783.0A priority Critical patent/CN113778379A/en
Publication of CN113778379A publication Critical patent/CN113778379A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/552Powers or roots, e.g. Pythagorean sums
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/4912Adding; Subtracting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a low-complexity hardware system and an application method based on CORDIC (coordinated rotation digital computer), aiming at the problems that in the prior art, the time-consuming long-term efficiency of realizing complex N-th square root operation by software is low, the process of realizing the complex N-th square root by hardware is complex, and the result number is uncertain, the invention uses three CORDIC module operators of circle, linear and hyperbolic curve to construct, uses a high-efficiency parallel result processing unit, enlarges the calculation convergence range, and software simulation can support 10‑8To 104Input in the range, the relative error can reach 10 orders of magnitude‑6The support range is wide and the precision is high; the invention adopts a pipeline architecture, and can realize high-speed full-flow calculation; reducing the computational complexity by utilizing the similarity of a complex number N times of square root N root solving processes; the calculation of a complex number of square roots of 2 to 10 times can be dynamically supported; the method has the characteristics of high efficiency, high precision and low hardware complexity.

Description

CORDIC-based low-complexity hardware system and application method
Technical Field
The invention relates to the technical field of complex N-th-order square root operation, in particular to a low-complexity hardware system based on CORDIC and an application method.
Background
The complex operation is a core part of circuit calculation, is widely applied to the fields of communication systems and signal processing, and is used for real-time data representation and system modeling; the N-power root operation is an important component of a complex function theory, and the N-power root operation is introduced in the calculation of polynomial calculation, matrix calculation, trigonometric function and the like in time to simplify the calculation process. However, the complex N-th root operation has a high complexity due to the uncertainty of the number of roots and the complexity of the complex operation, and most of the research on the N-th root operation focuses on real numbers or the complex N-th root operation is usually implemented by software. However, various algorithms are often adopted for mixed operation through software implementation, rather than a special algorithm, accurate and reliable calculation is guaranteed, redundancy exists in the process, and the performance in real-time work is poor.
Another approach is to accelerate the N-th root operation by Application Specific Integrated Circuit (ASIC) hardware to achieve high computational performance. However, only a few jobs are related to hardware implementation of complex square roots, and as high-order roots are widely applied in the fields of atmospheric models, radiation and the like, only square root operations cannot meet application requirements; while in a specific hardware implementation, the resource consumed by the constant associated with N is proportional to the range of N, in practical applications, the most common value of N is an integer from 2 to 10.
A coordinate rotation digital computer (CORDIC) can effectively calculate transcendental functions such as trigonometric functions, exponential functions, logarithmic functions and the like through simple shift and addition operation, can realize higher calculation speed, has better balance between precision and area, and realizes low cost. Therefore, the invention provides a low-complexity hardware solution based on CORDIC, which is used for calculating a complex number of square roots from 2 to 10, and reduces the complexity of hardware realization while realizing high calculation efficiency.
Because the number of results is uncertain, the N-th power root calculation has been a challenging issue, and most hardware implementations focus on the real N-th power root or just the quadratic root calculation. Chinese patent application No. CN202011357034.8, published 2021, 03.12.d., discloses a computing method for calculating a complex number of N root-opening numbers based on CORDIC method, which is an early work result of the applicant of the present invention, and although it can implement arbitrary order of N root-opening number computation, it can only serially compute each input complex number of N root-opening numbers according to the value of N, and only compute 1 of N results each time, but there are N results for the N root-opening number computation, when the value of N is large, the complexity of the circuit will increase rapidly with the increase of the value of N, so the computing efficiency and flexibility of the method are not sufficient.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems that time consumption and long efficiency of software for realizing complex N-time square root operation are low, the process of realizing the complex N-time square root by hardware is complex and the result quantity is uncertain in the prior art, the invention provides a low-complexity hardware system and an application method based on CORDIC (coordinated rotation digital computer). The low-complexity hardware system and the application method adopt a pipeline architecture, utilize the similarity of the solution process of the complex N-time square root and the N-number of roots to reduce the calculation cost, and have the characteristics of wide support range, high efficiency, high precision and low hardware complexity.
2. Technical scheme
The purpose of the invention is realized by the following technical scheme.
A low-complexity hardware system based on CORDIC comprises a circumferential vector mode CV-CORIDC module, a circumferential rotation mode CR-CORDIC module, a first linear vector mode LV1-CORDIC module, a second linear vector mode LV2-CORDIC module, a hyperbolic vector mode HV-CORDIC module, a hyperbolic rotation mode HR-CORDIC module and a result processing unit;
the input data of the system is input into the input end of a CV-CORIDC module, the output end of the CV-CORIDC module is connected with the input end of an HV-CORDIC module and the input end of an LV2-CORDIC module, the output end of the HV-CORDIC module is connected with the input end of an LV1-CORDIC module, the output end of the LV1-CORDIC module is connected with the input end of an HR-CORDIC module, the output end of the HR-CORDIC module and the output end of the LV2-CORDIC module are connected with the input end of a CR-CORDIC module, the output end of the CR-CORDIC module is connected with the input end of a result processing unit, and the result processing unit outputs a calculation result.
The invention uses the high-efficiency parallel result processing unit, reduces the calculation complexity by utilizing the similarity of the solving process of the N square roots of the plurality of N square roots, adopts the pipeline architecture, can dynamically support the calculation of the plurality of square roots from 2 to 10 times, saves the storage resource, and overcomes the problems of high calculation complexity and long calculation time of the N square roots.
Further, the system uses a pipelined architecture.
Further, the result processing unit comprises a plurality of parallel result units.
The invention uses three CORDIC modules of circle, linear and hyperbolic curve and a parallel result processing unit to construct a high-efficiency hardware implementation mode; the invention adopts a pipeline architecture and can realize high-speed full-flow calculation.
A complex 2 to 10 th power root operation method using said a CORDIC based low complexity hardware system; the complex number z is expressed as: and z is m + j N, and the N-th root of the complex number z is calculated by the exponential form of the complex number. N is an integer of 2 to 10 inclusive.
Specifically, the method comprises the following steps:
Figure BDA0003281972030000021
wherein
Figure BDA0003281972030000022
d is 0,1, …, N-1; when m is more than or equal to 0,
Figure BDA0003281972030000023
when m is less than 0 and n is more than or equal to 0,
Figure BDA0003281972030000024
when m is less than 0 and n is less than 0,
Figure BDA0003281972030000025
further, the result processing unit obtains the result through a two-angle sum and difference formula of a trigonometric function through shifting and adding operation
Figure BDA0003281972030000026
Real and imaginary parts of (c). The system inputs real part m and imaginary part n of complex number z, and the real part m and the imaginary part n are obtained through calculation of a settlement calculation unit
Figure BDA0003281972030000031
Real part of
Figure BDA0003281972030000032
And imaginary part
Figure BDA0003281972030000033
Further, according to the N different values of d,
Figure BDA0003281972030000034
n roots, d is 0,1, …, N-1; when the value of d is determined,
Figure BDA0003281972030000035
and
Figure BDA0003281972030000036
is a constant and is calculated in advance and stored in a look-up table.
Furthermore, the value of the integer N input in each clock cycle dynamically changes between 2 and 10, and the result processing unit dynamically activates different result units according to the difference of the input integer N to finish the calculation of N roots in parallel. The invention can dynamically support the calculation of the root of the complex number of 2 to 10 times, and the maximum can reach 10 paths of parallel calculation.
Furthermore, the invention innovatively uses a high-efficiency parallel result processing unit, and shares a computing resource CORDIC module by utilizing the similarity in the solving process of the N roots, thereby reducing the hardware realization complexity of the high-order N-th-order root. And the calculation result of the CORDIC module is output to a parallel result processing unit, so that the reduction of the hardware complexity is realized. The method solves the problems of high computation complexity and long computation time of the N-th square root.
Further, each CORDIC module is iterated X times through an X-stage pipeline architecture to complete result convergence, where the iteration times include a positive iteration time for determining the calculation accuracy and a negative iteration time for expanding the calculation convergence range, and X is an integer greater than zero. According to the simulation result, 23 iterations with higher precision are selected, 23 iterations are performed on each CORDIC module through a 23-stage pipeline architecture to finish result convergence, the maximum negative iteration frequency is-2, and the maximum positive iteration frequency is 20, wherein the positive iteration frequency determines the precision, the negative iteration is used for expanding the calculation convergence range, and the negative expansion method is slightly different for three coordinate systems:
(a) for the circular CORDIC algorithm, the sequence given by k-0, 0,0, 1.., 20 has been examined as a better sequence than k-2, -1,0, 1.., 20;
(b) for the linear CORDIC algorithm, we extend the iterative index set to k-2, -1,0, 1.., 20;
(c) for the hyperbolic CORDIC algorithm, we extend the iterative index set to k-2, -1,0, 1. The difference from the linear CORDIC algorithm is that the negative iteration has an iterative formula that is independent of the positive iteration operation.
The invention adopts a pipeline architecture, can simultaneously calculate a plurality of iterative processes, improves the throughput rate, takes the complex 10-time root operation as an example, and can calculate 10 roots in one clock cycle through early calculation.
Further, constant values associated with N are calculated and stored in a look-up table, corresponding calculated values being obtained from different integers N between 2 and 10.During the calculation process, by advance calculation such as
Figure BDA0003281972030000037
And
Figure BDA0003281972030000038
and constant values related to N are stored in the lookup table, and the constants can be read correspondingly in each iteration, so that the hardware calculation time is saved, the hardware expense is reduced, and the flexible calculation is realized.
3. Advantageous effects
Compared with the prior art, the invention discloses a low-complexity hardware system based on CORDIC, discloses an architecture for dynamically supporting complex root operations of 2 to 10 times, can effectively improve the input range, throughput rate and flexibility, and reduces the hardware realization complexity and resource consumption, and is particularly represented as follows:
(1) the invention enlarges the calculation convergence range, and the software simulation can support 10-8To 104Input in the range, the relative error can reach 10 orders of magnitude-6The support range is wide, and the calculation precision is high;
(2) the invention adopts a pipeline architecture, can simultaneously calculate a plurality of iterative processes and improve the throughput rate; in the calculation process, constant values related to N are calculated in advance and stored in a lookup table, and corresponding constants are flexibly taken out according to different integers N between 2 and 10, so that the hardware overhead is further reduced, and the storage resources are saved;
(3) the integer N value input in each clock cycle can be dynamically changed between 2 and 10, different result processing units are dynamically activated according to different input integers N, N results are calculated in parallel, flexibility is guaranteed, and meanwhile calculation efficiency is improved;
(4) the invention shares the computing resource by utilizing the similarity of the solving process of the N roots of the complex N-th-order square root, thereby reducing the hardware realization complexity of the high-order N-th-order square root; the solution is solved by simple shift and addition operations using CORDIC properties. Taking the calculation of a plurality of 10-time square roots as an example, through early calculation, 10 roots can be calculated in one clock period by the design, and compared with the calculation of only 1 root of a plurality of 2-time square roots, the circuit complexity is only increased by 0.157%; compared with the prior art, the calculation speed of the method can be increased by 379 times.
Drawings
FIG. 1 is a hardware architecture diagram of the present invention;
FIG. 2 is a diagram of a parallel result processing unit architecture of the present invention;
FIG. 3 is a schematic diagram of a 23-stage pipeline architecture of the CV-CORDIC module of the present invention;
FIG. 4 is a graph of the simulated relationship of the average relative error, the integer value of N, and the number of positive iterations P in the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and specific examples. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
Examples
The low-complexity hardware system architecture of the embodiment is as shown in fig. 1, the system inputs a real part m and an imaginary part n of a complex number z and passes through three CORDIC algorithms of circular, linear and hyperbolic linearity, and as shown in the figure, the system comprises a circular vector mode CV-CORDIC module, a circular rotation mode CR-CORDIC module, a first linear vector mode LV1-CORDIC module, a second linear vector mode LV2-CORDIC module, a hyperbolic vector mode HV-CORDIC module and a hyperbolic rotation mode HR-CORDIC module; the calculated data is firstly input into a CV-CORIDC module, the output data of the CV-CORDIC module is input into an HV-CORDIC module and an LV2-CORDIC module, the output data of the HV-CORDIC module sequentially passes through an LV1-CORDIC module and an HR-CORDIC module and flows to a CR-CORDIC module, the output data of the LV2-CORDIC module also flows to the CR-CORDIC module, the output data of the CR-CORDIC module is sent to a result processing unit, and the result processing unit obtains the calculated data through shifting and adding operation
Figure BDA0003281972030000051
And
Figure BDA0003281972030000052
as shown in the figure, the system adopts a full-pipeline architecture, and can realize high-speed full-pipeline calculation.
In the embodiment, five CORDIC algorithms are used for calculating the N-th root of the complex number in three coordinate systems, each coordinate system includes two modes of rotation and vector, and the generalized CORDIC formula is as follows:
xk+1=xk-μdk(2-kyk)
yk+1=yk+dk(2-kxk)
zk+1=zk-dkek
where k represents the current iteration number, dkRepresenting a decision operator, dkThe value of (d) is determined by the operating mode of the CORDIC, in the rotating mode, dk=sign(zk) (ii) a In vector mode, dk=sign(xkyk). In addition to this, when different coordinate systems are used to describe the CORDIC equations, μ and ekThe value of (a) is shown in the following formula:
Figure BDA0003281972030000053
in the above formula, circular represents a circular coordinate system, linear represents a linear coordinate system, and hyperbolic represents a hyperbolic coordinate system. In general, k starts from 0, the value of k increases by 1 in each iteration, and the calculation range can be expanded through negative expansion, namely k starts from a negative value; an exception to the hyperbolic coordinate system is that when k is 4,13,40, which are fixed special values, the iteration needs to be repeated once to ensure convergence.
In this embodiment, each CORDIC operator is iterated 23 times to complete result convergence, theoretically, the iteration number of this system may be any value without considering accuracy, and in this embodiment, 23 times are taken as an example, and higher accuracy can be achieved by obtaining 23 times of iteration according to software simulation. The maximum number of positive iterations is chosen to be 20, where the relative error can be of the order of 10-6Is enough toThe requirement of good precision is met; meanwhile, the maximum index of the negative iteration is-2, and the input range can be expanded to 10-8To 104And N times of square root operation of the general data is satisfied. Over 23 iterations, the CORDIC equation will converge to the values shown in table 1.
TABLE 1 CORDIC output Convergence values for three coordinate systems, two modes
Figure BDA0003281972030000054
Figure BDA0003281972030000061
In table 1, χ and λ are scaling factors used to correct the result, and for a circular coordinate system,
Figure BDA0003281972030000062
for a hyperbolic coordinate system, the system is,
Figure BDA0003281972030000063
since the number of iterations is known, these scaling factors are all constants that can be pre-computed by software and stored in a look-up table to avoid complex hardware computations.
According to table 1, the N-th square root of complex number z being m + j N is calculated by using CV-CORDIC module, HV-CORDIC module, LV1-CORDIC module, HR-CORDIC module, LV2-CORDIC module and CR-CORDIC module, wherein the calculated result of CV-CORDIC module needs to pass through a multi-stage buffer unit and then is sent to LV2-CORDIC module for calculation to ensure the synchronization with HR-CORDIC module, the connection mode of each CORIDC module operator and the input and output values are shown in figure 1, and the result processing unit finally outputs the result through shifting and adding operation
Figure BDA0003281972030000064
And
Figure BDA0003281972030000065
exponential form of z is ρ ej(2dπ+θ)Solving the square root of the complex number z for the N times through an exponential form; then:
Figure BDA0003281972030000066
wherein
Figure BDA0003281972030000067
d is 0,1, …, N-1; when m is more than or equal to 0,
Figure BDA0003281972030000068
when m is less than 0 and n is more than or equal to 0,
Figure BDA0003281972030000069
when m is less than 0 and n is less than 0,
Figure BDA00032819720300000610
depending on the N different values of d,
Figure BDA00032819720300000611
n roots are provided; when the value of d is determined,
Figure BDA00032819720300000612
and
Figure BDA00032819720300000613
are constants which are stored in a lookup table in advance, and corresponding constants are flexibly fetched according to different integers N between 2 and 10 so as to save hardware calculation time.
As shown in FIG. 2, the result processing unit adopts a parallel structure, and a trigonometric function dihedral sum difference formula is used in the parallel result processing unit to calculate
Figure BDA0003281972030000071
Real part of
Figure BDA0003281972030000072
And imaginary part
Figure BDA0003281972030000073
Figure BDA0003281972030000074
Figure BDA0003281972030000075
The high-order N-th-order root has N roots, and the complexity of the circuit is rapidly increased along with the increase of N. Taking N as an example of 10, if 10 paths of computing resources are used to compute 10 results in parallel, the hardware overhead is 10 times that of the original hardware; if 10 roots are computed in series, a large amount of computation time is consumed. In order to reduce the complexity of the high-order nth root, the embodiment shares the computing resources by using the similarity in the solving process of the N roots, and finally connects the parallel result processing units for parallel processing, thereby reducing the hardware implementation complexity of the high-order nth root.
The value of the integer N input in each clock cycle can be dynamically changed between 2 and 10, and the parallel result processing unit dynamically activates different result units in the result processing unit according to the difference of the input integer N to complete the computation of N roots in parallel, as shown in fig. 2, at this time, N is 3, and the result processing unit activates the result1 to the result3 units to compute 3 roots at the same time.
FIG. 3 is a diagram of the 23-stage pipeline architecture of the CV-CORDIC module of the present invention, and the architecture of the operators of other CORDIC modules is similar to that of the present invention. In the first three reverse iteration processes, k is 0, and the iteration formula at this time is:
xk+1=xk-sign(yk)*yk
yk+1=yk+sign(yk)*xk
zk+1=zk-sign(yk)*tan-1(1)
during the last twenty forward iterations, k is 1,2, …,20, and the iteration formula is:
xk+1=xk-sign(yk)*yk*2-k
yk+1=yk+sign(yk)*xk*2-k
zk+1=zk-sign(yk)*tan-1(2-k)
wherein tan is-1(1) And tan-1(2-k) Is stored in a look-up table. In this embodiment, each CORDIC module operator is iterated for 23 times to complete result convergence, and the iteration number is matched with the pipeline level, so that high-throughput fixed-point implementation of 2 to 10 times of root computation can be finally completed. Software simulation proves that the maximum number of positive iterations is 20, which is enough to meet the requirement of good precision, meanwhile, the maximum index of negative iterations is-2, and the negative expansion method is slightly different and specific for three coordinate systems:
(a) for the circular coordinate system CORDIC algorithm, the sequence given by k-0, 0,0, 1.., 20 has been examined as a better sequence than k-2, -1,0, 1.., 20;
(b) for the linear coordinate system CORDIC algorithm, the iteration index set is extended to k-2, -1,0, 1.., 20, as shown in fig. 3;
(c) for the hyperbolic coordinate system CORDIC algorithm, the iteration index set is extended to k-2, -1,0, 1., 20 as with the linear coordinate system CORDIC algorithm, and the difference from the linear coordinate system CORDIC algorithm is that in the hyperbolic coordinate system CORDIC algorithm, the negative iteration has an iterative formula independent of the positive iteration operation.
In this embodiment, the precision is determined by the number of positive iterations, and the convergence range is expanded by the negative iterations, which have great flexibility, and 10000 data are simulated and calculated by MATLAB.
On the basis that the maximum negative iteration number is-2 and the maximum positive iteration number is P, the relation between the average relative error and the integer N value and the positive iteration number P is explored in the experiment.
The integer N supports values from 2 to 10, when P varies from 15 to 20, resulting in a simulated relationship graph of average relative error, integer N value and number of positive iterations P as shown in fig. 4, when P is 20, average relative errorThe error can reach 1.38 x 10-6Can support 10-8To 104The input range of (1).
This example was modeled using the Verilog HDL language and hardware simulation based on 10000 test data with an average relative error of 2.9578 × 10-6. Taking N as an example 10, an exemplary circuit using a TSMC 28nm CMOS process is synthesized by hardware implementation, and table 2 is a table comparing the comprehensive results and performance of this embodiment with those of the prior art.
TABLE 2
Process for the preparation of a coating Framework Frequency of Area (μm)2) Calculating a delay Accuracy of measurement
Earlier stage work 1 28nm Non-pipelined architecture 1.5GHz 6561 170.9ns 9.6117*10-5
Working in the early stage2 28nm Pipeline architecture 2.218GHz 67964.17 4.50ns 2.9660*10-6
Working of the invention 28nm Pipeline architecture 2.218GHz 68070.87 0.451ns 2.9578*10-6
The prior work 1 is a patent cited in the background art, and is a result of the prior work of the applicant, the working frequency of the method is 1.5GHz, a pipeline architecture is not adopted, each input complex number N-th-order root can be serially calculated only according to the value of N, only 1 of N results is calculated each time, and taking N ═ 10 as an example, 255 clock cycles are needed for calculating one complex number 10-th-order root, and 170.9ns are needed.
The early-stage work 2 is to add a pipeline architecture on the basis of the early-stage work 1, the working frequency is 2.218GHz, taking N as an example 10, only 1 of 10 roots is calculated each time, and then 10 clock cycles are needed for calculating a complex 10-th-order root, and 4.5ns is needed.
In this embodiment, on the basis of the earlier-stage work 2, a parallel result processing unit is designed, and the computation complexity is reduced by using the similarity of the complex number N-th-order square root N root solving processes, taking N ═ 10 as an example, through earlier computation, only 1 clock cycle is needed for computing one complex number 10-th-order square root, and 0.45ns is needed;
from the comprehensive results in table 2, the circuit complexity of the present embodiment is increased by only 0.157% compared with the previous operation 2, and the calculation speed is increased by 10 times. Compared with the earlier work 1, the calculation efficiency of the system of the embodiment can be improved by 379 times to the maximum extent, and the hardware implementation precision is improved by 1 order of magnitude.
In summary, the present invention provides a low hardware complexity architecture for dynamically supporting complex 2 to 10 th power root operations, which uses three CORDIC systems, circular, linear and hyperbolic, to construct a hardware efficient algorithm. The convergence range is expanded, and the software simulation can support 10-8To 104Up to 10-6The relative error of the system is wide in support range and high in precision; by adopting a pipeline architecture, high-speed full-flow calculation can be realized; reducing the computational complexity by utilizing the similarity of a complex number N times of square root N root solving processes; the method can dynamically support the calculation of the complex number of the square root of 2 to 10 times, and has the characteristics of high efficiency, high precision and low hardware complexity.
The invention and its embodiments have been described above schematically and without limitation, and although the invention has been shown and described with reference to specific preferred embodiments, it should not be construed as being limited to the invention itself. Various changes in form and details may be made therein without departing from the spirit or essential characteristics thereof, and it is intended that all matter contained in the accompanying claims and claims be interpreted as illustrative and not in a limiting sense. Several of the elements described in this application may also be implemented by one element in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (10)

1. A low-complexity hardware system based on CORDIC is characterized by comprising a circumferential vector mode CV-CORIDC module, a circumferential rotation mode CR-CORDIC module, a first linear vector mode LV1-CORDIC module, a second linear vector mode LV2-CORDIC module, a hyperbolic vector mode HV-CORDIC module, a hyperbolic rotation mode HR-CORDIC module and a result processing unit;
the input data of the system is input into the input end of a CV-CORIDC module, the output end of the CV-CORIDC module is connected with the input end of an HV-CORDIC module and the input end of an LV2-CORDIC module, the output end of the HV-CORDIC module is connected with the input end of an LV1-CORDIC module, the output end of the LV1-CORDIC module is connected with the input end of an HR-CORDIC module, the output end of the HR-CORDIC module and the output end of the LV2-CORDIC module are connected with the input end of a CR-CORDIC module, the output end of the CR-CORDIC module is connected with the input end of a result processing unit, and the result processing unit outputs a calculation result.
2. A CORDIC based low complexity hardware system according to claim 1 wherein the system uses a pipelined architecture.
3. A CORDIC based low complexity hardware system according to claim 1 wherein the result processing unit comprises several result units in parallel.
4. A method of complex 2 to 10 th power root operations using a CORDIC based low complexity hardware system as claimed in any one of claims 1 to 3; the complex number z is expressed as: and z is m + j N, the root of the complex number z is calculated by the exponential form of the complex number, and N is an integer which is greater than or equal to 2 and less than or equal to 10.
5. A method for 2-10 th power root operation on complex numbers as claimed in claim 4, wherein the result processing unit performs shift and addition operations by using the formula of the sum and difference of two angles of the trigonometric function
Figure FDA0003281972020000011
Real and imaginary parts of (c).
6. A method for complex 2 to 10 th power root operation according to claim 5, wherein the root operation is performed according to N different values of d,
Figure FDA0003281972020000012
n roots, d is 0,1, …, N-1; when the value of d is determined,
Figure FDA0003281972020000013
and
Figure FDA0003281972020000014
is a constant and is calculated in advance and stored in a look-up table.
7. The method as claimed in claim 5, wherein the value of N, an integer, input per clock cycle, is dynamically changed from 2 to 10, and the result processing unit dynamically activates different result units according to the difference of N, input integers, and completes the computation of N roots in parallel.
8. The method of claim 7, wherein the result processing unit is a parallel result processing unit, and the result processing unit shares computing resources according to the similarity of the N root solving processes.
9. The method of claim 5, wherein the result convergence is achieved by performing X iterations on each CORDIC module through an X-stage pipeline architecture, the iterations including a positive iteration for determining the calculation accuracy and a negative iteration for expanding the calculation convergence range, and X is an integer greater than zero.
10. A method as claimed in claim 6, wherein constant values associated with N are stored in a look-up table, and corresponding calculated values are obtained from different integers N between 2 and 10.
CN202111135783.0A 2021-09-27 2021-09-27 CORDIC-based low-complexity hardware system and application method Pending CN113778379A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111135783.0A CN113778379A (en) 2021-09-27 2021-09-27 CORDIC-based low-complexity hardware system and application method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111135783.0A CN113778379A (en) 2021-09-27 2021-09-27 CORDIC-based low-complexity hardware system and application method

Publications (1)

Publication Number Publication Date
CN113778379A true CN113778379A (en) 2021-12-10

Family

ID=78853695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111135783.0A Pending CN113778379A (en) 2021-09-27 2021-09-27 CORDIC-based low-complexity hardware system and application method

Country Status (1)

Country Link
CN (1) CN113778379A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778378A (en) * 2021-09-27 2021-12-10 南京宁麒智能计算芯片研究院有限公司 Device and method for solving complex number N-degree square root

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160041947A1 (en) * 2014-08-05 2016-02-11 Imagination Technologies, Limited Implementing a square root operation in a computer system
CN111443893A (en) * 2020-04-28 2020-07-24 南京大学 N-time root calculation device and method based on CORDIC algorithm
CN111984227A (en) * 2020-08-26 2020-11-24 南京大学 Approximate calculation device and method for complex square root
CN112486455A (en) * 2020-11-27 2021-03-12 南京大学 Hardware computing system and computing method for solving complex N-time root opening numbers based on CORDIC method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160041947A1 (en) * 2014-08-05 2016-02-11 Imagination Technologies, Limited Implementing a square root operation in a computer system
CN111443893A (en) * 2020-04-28 2020-07-24 南京大学 N-time root calculation device and method based on CORDIC algorithm
CN111984227A (en) * 2020-08-26 2020-11-24 南京大学 Approximate calculation device and method for complex square root
CN112486455A (en) * 2020-11-27 2021-03-12 南京大学 Hardware computing system and computing method for solving complex N-time root opening numbers based on CORDIC method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张伟;张安堂;肖宇;: "基于坐标旋转数字计算方法的三维坐标变换", 探测与控制学报, no. 02, 26 April 2011 (2011-04-26) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778378A (en) * 2021-09-27 2021-12-10 南京宁麒智能计算芯片研究院有限公司 Device and method for solving complex number N-degree square root
CN113778378B (en) * 2021-09-27 2024-09-10 南京宁麒智能计算芯片研究院有限公司 Device and method for solving complex N times square root

Similar Documents

Publication Publication Date Title
Kumar FPGA implementation of the trigonometric functions using the CORDIC algorithm
Juang et al. A lower error and ROM-free logarithmic converter for digital signal processing applications
CN109739470B (en) Computing system based on arbitrary exponential function of type 2 hyperbolic CORDIC
US5060182A (en) Method and apparatus for performing the square root function using a rectangular aspect ratio multiplier
CN111078187B (en) Method for solving arbitrary root of square aiming at single-precision floating point number and solver thereof
JP4199100B2 (en) Function calculation method and function calculation circuit
CN111443893A (en) N-time root calculation device and method based on CORDIC algorithm
Hussain et al. An efficient and fast softmax hardware architecture (EFSHA) for deep neural networks
CN110187866B (en) Hyperbolic CORDIC-based logarithmic multiplication computing system and method
CN113778379A (en) CORDIC-based low-complexity hardware system and application method
CN113778378B (en) Device and method for solving complex N times square root
Pang et al. VHDL Modeling of Booth Radix-4 Floating Point Multiplier for VLSI Designer’s Library
CN111984226B (en) Cube root solving device and solving method based on hyperbolic CORDIC
Aslan et al. Realization of area efficient QR factorization using unified division, square root, and inverse square root hardware
Chen et al. A general methodology and architecture for arbitrary complex number Nth root computation
Chen et al. Low-complexity high-precision method and architecture for computing the logarithm of complex numbers
Naregal et al. Design and implementation of high efficiency vedic binary multiplier circuit based on squaring circuits
Nouri et al. Design and evaluation of correlation accelerator in IEEE-802.11 a/g receiver using a template-based coarse-grained reconfigurable array
CN107203491A (en) A kind of triangle systolic array architecture QR decomposers for FPGA
Rajaby et al. Hardware design and implementation of high-efficiency cube-root of complex numbers
Anuhya et al. ASIC implementation of efficient floating point multiplier
CN113515259B (en) Complex number approximate modulus realization circuit and method suitable for floating point format
Chang et al. Fixed-point computing element design for transcendental functions and primary operations in speech processing
Kaur et al. Implementation of High Speed Fixed Point CORDIC Techniques
CN110096677B (en) Quick calculation method and system for high-order derivative function based on probability calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination