CN113778379A

CN113778379A - CORDIC-based low-complexity hardware system and application method

Info

Publication number: CN113778379A
Application number: CN202111135783.0A
Authority: CN
Inventors: 李丽; 徐瑾; 傅玉祥; 陈辉; 蒋林; 武瑞琪; 何书专; 陈健
Original assignee: Nanjing Ningqi Intelligent Computing Chip Research Institute Co ltd
Current assignee: Nanjing Ningqi Intelligent Computing Chip Research Institute Co ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-10

Abstract

The invention discloses a low-complexity hardware system and an application method based on CORDIC (coordinated rotation digital computer), aiming at the problems that in the prior art, the time-consuming long-term efficiency of realizing complex N-th square root operation by software is low, the process of realizing the complex N-th square root by hardware is complex, and the result number is uncertain, the invention uses three CORDIC module operators of circle, linear and hyperbolic curve to construct, uses a high-efficiency parallel result processing unit, enlarges the calculation convergence range, and software simulation can support 10^‑8To 10⁴Input in the range, the relative error can reach 10 orders of magnitude^‑6The support range is wide and the precision is high; the invention adopts a pipeline architecture, and can realize high-speed full-flow calculation; reducing the computational complexity by utilizing the similarity of a complex number N times of square root N root solving processes; the calculation of a complex number of square roots of 2 to 10 times can be dynamically supported; the method has the characteristics of high efficiency, high precision and low hardware complexity.

Description

CORDIC-based low-complexity hardware system and application method

Technical Field

The invention relates to the technical field of complex N-th-order square root operation, in particular to a low-complexity hardware system based on CORDIC and an application method.

Background

The complex operation is a core part of circuit calculation, is widely applied to the fields of communication systems and signal processing, and is used for real-time data representation and system modeling; the N-power root operation is an important component of a complex function theory, and the N-power root operation is introduced in the calculation of polynomial calculation, matrix calculation, trigonometric function and the like in time to simplify the calculation process. However, the complex N-th root operation has a high complexity due to the uncertainty of the number of roots and the complexity of the complex operation, and most of the research on the N-th root operation focuses on real numbers or the complex N-th root operation is usually implemented by software. However, various algorithms are often adopted for mixed operation through software implementation, rather than a special algorithm, accurate and reliable calculation is guaranteed, redundancy exists in the process, and the performance in real-time work is poor.

Another approach is to accelerate the N-th root operation by Application Specific Integrated Circuit (ASIC) hardware to achieve high computational performance. However, only a few jobs are related to hardware implementation of complex square roots, and as high-order roots are widely applied in the fields of atmospheric models, radiation and the like, only square root operations cannot meet application requirements; while in a specific hardware implementation, the resource consumed by the constant associated with N is proportional to the range of N, in practical applications, the most common value of N is an integer from 2 to 10.

A coordinate rotation digital computer (CORDIC) can effectively calculate transcendental functions such as trigonometric functions, exponential functions, logarithmic functions and the like through simple shift and addition operation, can realize higher calculation speed, has better balance between precision and area, and realizes low cost. Therefore, the invention provides a low-complexity hardware solution based on CORDIC, which is used for calculating a complex number of square roots from 2 to 10, and reduces the complexity of hardware realization while realizing high calculation efficiency.

Because the number of results is uncertain, the N-th power root calculation has been a challenging issue, and most hardware implementations focus on the real N-th power root or just the quadratic root calculation. Chinese patent application No. CN202011357034.8, published 2021, 03.12.d., discloses a computing method for calculating a complex number of N root-opening numbers based on CORDIC method, which is an early work result of the applicant of the present invention, and although it can implement arbitrary order of N root-opening number computation, it can only serially compute each input complex number of N root-opening numbers according to the value of N, and only compute 1 of N results each time, but there are N results for the N root-opening number computation, when the value of N is large, the complexity of the circuit will increase rapidly with the increase of the value of N, so the computing efficiency and flexibility of the method are not sufficient.

Disclosure of Invention

1. Technical problem to be solved

Aiming at the problems that time consumption and long efficiency of software for realizing complex N-time square root operation are low, the process of realizing the complex N-time square root by hardware is complex and the result quantity is uncertain in the prior art, the invention provides a low-complexity hardware system and an application method based on CORDIC (coordinated rotation digital computer). The low-complexity hardware system and the application method adopt a pipeline architecture, utilize the similarity of the solution process of the complex N-time square root and the N-number of roots to reduce the calculation cost, and have the characteristics of wide support range, high efficiency, high precision and low hardware complexity.

2. Technical scheme

The purpose of the invention is realized by the following technical scheme.

A low-complexity hardware system based on CORDIC comprises a circumferential vector mode CV-CORIDC module, a circumferential rotation mode CR-CORDIC module, a first linear vector mode LV1-CORDIC module, a second linear vector mode LV2-CORDIC module, a hyperbolic vector mode HV-CORDIC module, a hyperbolic rotation mode HR-CORDIC module and a result processing unit;

the input data of the system is input into the input end of a CV-CORIDC module, the output end of the CV-CORIDC module is connected with the input end of an HV-CORDIC module and the input end of an LV2-CORDIC module, the output end of the HV-CORDIC module is connected with the input end of an LV1-CORDIC module, the output end of the LV1-CORDIC module is connected with the input end of an HR-CORDIC module, the output end of the HR-CORDIC module and the output end of the LV2-CORDIC module are connected with the input end of a CR-CORDIC module, the output end of the CR-CORDIC module is connected with the input end of a result processing unit, and the result processing unit outputs a calculation result.

The invention uses the high-efficiency parallel result processing unit, reduces the calculation complexity by utilizing the similarity of the solving process of the N square roots of the plurality of N square roots, adopts the pipeline architecture, can dynamically support the calculation of the plurality of square roots from 2 to 10 times, saves the storage resource, and overcomes the problems of high calculation complexity and long calculation time of the N square roots.

Further, the system uses a pipelined architecture.

Further, the result processing unit comprises a plurality of parallel result units.

The invention uses three CORDIC modules of circle, linear and hyperbolic curve and a parallel result processing unit to construct a high-efficiency hardware implementation mode; the invention adopts a pipeline architecture and can realize high-speed full-flow calculation.

A complex 2 to 10 th power root operation method using said a CORDIC based low complexity hardware system; the complex number z is expressed as: and z is m + j N, and the N-th root of the complex number z is calculated by the exponential form of the complex number. N is an integer of 2 to 10 inclusive.

Specifically, the method comprises the following steps:

wherein

d is 0,1, …, N-1; when m is more than or equal to 0,

when m is less than 0 and n is more than or equal to 0,

when m is less than 0 and n is less than 0,

further, the result processing unit obtains the result through a two-angle sum and difference formula of a trigonometric function through shifting and adding operation

Real and imaginary parts of (c). The system inputs real part m and imaginary part n of complex number z, and the real part m and the imaginary part n are obtained through calculation of a settlement calculation unit

Real part of

And imaginary part

Further, according to the N different values of d,

n roots, d is 0,1, …, N-1; when the value of d is determined,

and

is a constant and is calculated in advance and stored in a look-up table.

Furthermore, the value of the integer N input in each clock cycle dynamically changes between 2 and 10, and the result processing unit dynamically activates different result units according to the difference of the input integer N to finish the calculation of N roots in parallel. The invention can dynamically support the calculation of the root of the complex number of 2 to 10 times, and the maximum can reach 10 paths of parallel calculation.

Furthermore, the invention innovatively uses a high-efficiency parallel result processing unit, and shares a computing resource CORDIC module by utilizing the similarity in the solving process of the N roots, thereby reducing the hardware realization complexity of the high-order N-th-order root. And the calculation result of the CORDIC module is output to a parallel result processing unit, so that the reduction of the hardware complexity is realized. The method solves the problems of high computation complexity and long computation time of the N-th square root.

Further, each CORDIC module is iterated X times through an X-stage pipeline architecture to complete result convergence, where the iteration times include a positive iteration time for determining the calculation accuracy and a negative iteration time for expanding the calculation convergence range, and X is an integer greater than zero. According to the simulation result, 23 iterations with higher precision are selected, 23 iterations are performed on each CORDIC module through a 23-stage pipeline architecture to finish result convergence, the maximum negative iteration frequency is-2, and the maximum positive iteration frequency is 20, wherein the positive iteration frequency determines the precision, the negative iteration is used for expanding the calculation convergence range, and the negative expansion method is slightly different for three coordinate systems:

(a) for the circular CORDIC algorithm, the sequence given by k-0, 0,0, 1.., 20 has been examined as a better sequence than k-2, -1,0, 1.., 20;

(b) for the linear CORDIC algorithm, we extend the iterative index set to k-2, -1,0, 1.., 20;

(c) for the hyperbolic CORDIC algorithm, we extend the iterative index set to k-2, -1,0, 1. The difference from the linear CORDIC algorithm is that the negative iteration has an iterative formula that is independent of the positive iteration operation.

The invention adopts a pipeline architecture, can simultaneously calculate a plurality of iterative processes, improves the throughput rate, takes the complex 10-time root operation as an example, and can calculate 10 roots in one clock cycle through early calculation.

Further, constant values associated with N are calculated and stored in a look-up table, corresponding calculated values being obtained from different integers N between 2 and 10.During the calculation process, by advance calculation such as

And

and constant values related to N are stored in the lookup table, and the constants can be read correspondingly in each iteration, so that the hardware calculation time is saved, the hardware expense is reduced, and the flexible calculation is realized.

3. Advantageous effects

Compared with the prior art, the invention discloses a low-complexity hardware system based on CORDIC, discloses an architecture for dynamically supporting complex root operations of 2 to 10 times, can effectively improve the input range, throughput rate and flexibility, and reduces the hardware realization complexity and resource consumption, and is particularly represented as follows:

(1) the invention enlarges the calculation convergence range, and the software simulation can support 10^-8To 10⁴Input in the range, the relative error can reach 10 orders of magnitude^-6The support range is wide, and the calculation precision is high;

(2) the invention adopts a pipeline architecture, can simultaneously calculate a plurality of iterative processes and improve the throughput rate; in the calculation process, constant values related to N are calculated in advance and stored in a lookup table, and corresponding constants are flexibly taken out according to different integers N between 2 and 10, so that the hardware overhead is further reduced, and the storage resources are saved;

(3) the integer N value input in each clock cycle can be dynamically changed between 2 and 10, different result processing units are dynamically activated according to different input integers N, N results are calculated in parallel, flexibility is guaranteed, and meanwhile calculation efficiency is improved;

(4) the invention shares the computing resource by utilizing the similarity of the solving process of the N roots of the complex N-th-order square root, thereby reducing the hardware realization complexity of the high-order N-th-order square root; the solution is solved by simple shift and addition operations using CORDIC properties. Taking the calculation of a plurality of 10-time square roots as an example, through early calculation, 10 roots can be calculated in one clock period by the design, and compared with the calculation of only 1 root of a plurality of 2-time square roots, the circuit complexity is only increased by 0.157%; compared with the prior art, the calculation speed of the method can be increased by 379 times.

Drawings

FIG. 1 is a hardware architecture diagram of the present invention;

FIG. 2 is a diagram of a parallel result processing unit architecture of the present invention;

FIG. 3 is a schematic diagram of a 23-stage pipeline architecture of the CV-CORDIC module of the present invention;

FIG. 4 is a graph of the simulated relationship of the average relative error, the integer value of N, and the number of positive iterations P in the present invention.

Detailed Description

The invention is described in detail below with reference to the drawings and specific examples. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

Examples

The low-complexity hardware system architecture of the embodiment is as shown in fig. 1, the system inputs a real part m and an imaginary part n of a complex number z and passes through three CORDIC algorithms of circular, linear and hyperbolic linearity, and as shown in the figure, the system comprises a circular vector mode CV-CORDIC module, a circular rotation mode CR-CORDIC module, a first linear vector mode LV1-CORDIC module, a second linear vector mode LV2-CORDIC module, a hyperbolic vector mode HV-CORDIC module and a hyperbolic rotation mode HR-CORDIC module; the calculated data is firstly input into a CV-CORIDC module, the output data of the CV-CORDIC module is input into an HV-CORDIC module and an LV2-CORDIC module, the output data of the HV-CORDIC module sequentially passes through an LV1-CORDIC module and an HR-CORDIC module and flows to a CR-CORDIC module, the output data of the LV2-CORDIC module also flows to the CR-CORDIC module, the output data of the CR-CORDIC module is sent to a result processing unit, and the result processing unit obtains the calculated data through shifting and adding operation

And

as shown in the figure, the system adopts a full-pipeline architecture, and can realize high-speed full-pipeline calculation.

In the embodiment, five CORDIC algorithms are used for calculating the N-th root of the complex number in three coordinate systems, each coordinate system includes two modes of rotation and vector, and the generalized CORDIC formula is as follows:

x_k+1＝x_k-μd_k(2^-ky_k)

y_k+1＝y_k+d_k(2^-kx_k)

z_k+1＝z_k-d_ke_k

where k represents the current iteration number, d_kRepresenting a decision operator, d_kThe value of (d) is determined by the operating mode of the CORDIC, in the rotating mode, d_k＝sign(z_k) (ii) a In vector mode, d_k＝sign(x_ky_k). In addition to this, when different coordinate systems are used to describe the CORDIC equations, μ and e_kThe value of (a) is shown in the following formula:

in the above formula, circular represents a circular coordinate system, linear represents a linear coordinate system, and hyperbolic represents a hyperbolic coordinate system. In general, k starts from 0, the value of k increases by 1 in each iteration, and the calculation range can be expanded through negative expansion, namely k starts from a negative value; an exception to the hyperbolic coordinate system is that when k is 4,13,40, which are fixed special values, the iteration needs to be repeated once to ensure convergence.

In this embodiment, each CORDIC operator is iterated 23 times to complete result convergence, theoretically, the iteration number of this system may be any value without considering accuracy, and in this embodiment, 23 times are taken as an example, and higher accuracy can be achieved by obtaining 23 times of iteration according to software simulation. The maximum number of positive iterations is chosen to be 20, where the relative error can be of the order of 10^-6Is enough toThe requirement of good precision is met; meanwhile, the maximum index of the negative iteration is-2, and the input range can be expanded to 10^-8To 10⁴And N times of square root operation of the general data is satisfied. Over 23 iterations, the CORDIC equation will converge to the values shown in table 1.

TABLE 1 CORDIC output Convergence values for three coordinate systems, two modes

In table 1, χ and λ are scaling factors used to correct the result, and for a circular coordinate system,

for a hyperbolic coordinate system, the system is,

since the number of iterations is known, these scaling factors are all constants that can be pre-computed by software and stored in a look-up table to avoid complex hardware computations.

According to table 1, the N-th square root of complex number z being m + j N is calculated by using CV-CORDIC module, HV-CORDIC module, LV1-CORDIC module, HR-CORDIC module, LV2-CORDIC module and CR-CORDIC module, wherein the calculated result of CV-CORDIC module needs to pass through a multi-stage buffer unit and then is sent to LV2-CORDIC module for calculation to ensure the synchronization with HR-CORDIC module, the connection mode of each CORIDC module operator and the input and output values are shown in figure 1, and the result processing unit finally outputs the result through shifting and adding operation

And

exponential form of z is ρ e^j(2dπ+θ)Solving the square root of the complex number z for the N times through an exponential form; then:

wherein

d is 0,1, …, N-1; when m is more than or equal to 0,

when m is less than 0 and n is more than or equal to 0,

when m is less than 0 and n is less than 0,

depending on the N different values of d,

n roots are provided; when the value of d is determined,

and

are constants which are stored in a lookup table in advance, and corresponding constants are flexibly fetched according to different integers N between 2 and 10 so as to save hardware calculation time.

As shown in FIG. 2, the result processing unit adopts a parallel structure, and a trigonometric function dihedral sum difference formula is used in the parallel result processing unit to calculate

Real part of

And imaginary part

The high-order N-th-order root has N roots, and the complexity of the circuit is rapidly increased along with the increase of N. Taking N as an example of 10, if 10 paths of computing resources are used to compute 10 results in parallel, the hardware overhead is 10 times that of the original hardware; if 10 roots are computed in series, a large amount of computation time is consumed. In order to reduce the complexity of the high-order nth root, the embodiment shares the computing resources by using the similarity in the solving process of the N roots, and finally connects the parallel result processing units for parallel processing, thereby reducing the hardware implementation complexity of the high-order nth root.

The value of the integer N input in each clock cycle can be dynamically changed between 2 and 10, and the parallel result processing unit dynamically activates different result units in the result processing unit according to the difference of the input integer N to complete the computation of N roots in parallel, as shown in fig. 2, at this time, N is 3, and the result processing unit activates the result1 to the result3 units to compute 3 roots at the same time.

FIG. 3 is a diagram of the 23-stage pipeline architecture of the CV-CORDIC module of the present invention, and the architecture of the operators of other CORDIC modules is similar to that of the present invention. In the first three reverse iteration processes, k is 0, and the iteration formula at this time is:

x_k+1＝x_k-sign(y_k)*y_k

y_k+1＝y_k+sign(y_k)*x_k

z_k+1＝z_k-sign(y_k)*tan^-1(1)

during the last twenty forward iterations, k is 1,2, …,20, and the iteration formula is:

x_k+1＝x_k-sign(y_k)*y_k*2^-k

y_k+1＝y_k+sign(y_k)*x_k*2^-k

z_k+1＝z_k-sign(y_k)*tan^-1(2^-k)

wherein tan is^-1(1) And tan^-1(2^-k) Is stored in a look-up table. In this embodiment, each CORDIC module operator is iterated for 23 times to complete result convergence, and the iteration number is matched with the pipeline level, so that high-throughput fixed-point implementation of 2 to 10 times of root computation can be finally completed. Software simulation proves that the maximum number of positive iterations is 20, which is enough to meet the requirement of good precision, meanwhile, the maximum index of negative iterations is-2, and the negative expansion method is slightly different and specific for three coordinate systems:

(a) for the circular coordinate system CORDIC algorithm, the sequence given by k-0, 0,0, 1.., 20 has been examined as a better sequence than k-2, -1,0, 1.., 20;

(b) for the linear coordinate system CORDIC algorithm, the iteration index set is extended to k-2, -1,0, 1.., 20, as shown in fig. 3;

(c) for the hyperbolic coordinate system CORDIC algorithm, the iteration index set is extended to k-2, -1,0, 1., 20 as with the linear coordinate system CORDIC algorithm, and the difference from the linear coordinate system CORDIC algorithm is that in the hyperbolic coordinate system CORDIC algorithm, the negative iteration has an iterative formula independent of the positive iteration operation.

In this embodiment, the precision is determined by the number of positive iterations, and the convergence range is expanded by the negative iterations, which have great flexibility, and 10000 data are simulated and calculated by MATLAB.

On the basis that the maximum negative iteration number is-2 and the maximum positive iteration number is P, the relation between the average relative error and the integer N value and the positive iteration number P is explored in the experiment.

The integer N supports values from 2 to 10, when P varies from 15 to 20, resulting in a simulated relationship graph of average relative error, integer N value and number of positive iterations P as shown in fig. 4, when P is 20, average relative errorThe error can reach 1.38 x 10^-6Can support 10^-8To 10⁴The input range of (1).

This example was modeled using the Verilog HDL language and hardware simulation based on 10000 test data with an average relative error of 2.9578 × 10^-6. Taking N as an example 10, an exemplary circuit using a TSMC 28nm CMOS process is synthesized by hardware implementation, and table 2 is a table comparing the comprehensive results and performance of this embodiment with those of the prior art.

TABLE 2

Process for the preparation of a coating

Framework

Frequency of

Area (μm)²)

Calculating a delay

Accuracy of measurement

Earlier stage work 1

28nm

Non-pipelined architecture

1.5GHz

6561

170.9ns

9.6117*10^-5

Working in the early stage2

28nm

Pipeline architecture

2.218GHz

67964.17

4.50ns

2.9660*10^-6

Working of the invention

28nm

Pipeline architecture

2.218GHz

68070.87

0.451ns

2.9578*10^-6

The prior work 1 is a patent cited in the background art, and is a result of the prior work of the applicant, the working frequency of the method is 1.5GHz, a pipeline architecture is not adopted, each input complex number N-th-order root can be serially calculated only according to the value of N, only 1 of N results is calculated each time, and taking N ═ 10 as an example, 255 clock cycles are needed for calculating one complex number 10-th-order root, and 170.9ns are needed.

The early-stage work 2 is to add a pipeline architecture on the basis of the early-stage work 1, the working frequency is 2.218GHz, taking N as an example 10, only 1 of 10 roots is calculated each time, and then 10 clock cycles are needed for calculating a complex 10-th-order root, and 4.5ns is needed.

In this embodiment, on the basis of the earlier-stage work 2, a parallel result processing unit is designed, and the computation complexity is reduced by using the similarity of the complex number N-th-order square root N root solving processes, taking N ═ 10 as an example, through earlier computation, only 1 clock cycle is needed for computing one complex number 10-th-order square root, and 0.45ns is needed;

from the comprehensive results in table 2, the circuit complexity of the present embodiment is increased by only 0.157% compared with the previous operation 2, and the calculation speed is increased by 10 times. Compared with the earlier work 1, the calculation efficiency of the system of the embodiment can be improved by 379 times to the maximum extent, and the hardware implementation precision is improved by 1 order of magnitude.

In summary, the present invention provides a low hardware complexity architecture for dynamically supporting complex 2 to 10 th power root operations, which uses three CORDIC systems, circular, linear and hyperbolic, to construct a hardware efficient algorithm. The convergence range is expanded, and the software simulation can support 10^-8To 10⁴Up to 10^-6The relative error of the system is wide in support range and high in precision; by adopting a pipeline architecture, high-speed full-flow calculation can be realized; reducing the computational complexity by utilizing the similarity of a complex number N times of square root N root solving processes; the method can dynamically support the calculation of the complex number of the square root of 2 to 10 times, and has the characteristics of high efficiency, high precision and low hardware complexity.

The invention and its embodiments have been described above schematically and without limitation, and although the invention has been shown and described with reference to specific preferred embodiments, it should not be construed as being limited to the invention itself. Various changes in form and details may be made therein without departing from the spirit or essential characteristics thereof, and it is intended that all matter contained in the accompanying claims and claims be interpreted as illustrative and not in a limiting sense. Several of the elements described in this application may also be implemented by one element in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A low-complexity hardware system based on CORDIC is characterized by comprising a circumferential vector mode CV-CORIDC module, a circumferential rotation mode CR-CORDIC module, a first linear vector mode LV1-CORDIC module, a second linear vector mode LV2-CORDIC module, a hyperbolic vector mode HV-CORDIC module, a hyperbolic rotation mode HR-CORDIC module and a result processing unit;

2. A CORDIC based low complexity hardware system according to claim 1 wherein the system uses a pipelined architecture.

3. A CORDIC based low complexity hardware system according to claim 1 wherein the result processing unit comprises several result units in parallel.

4. A method of complex 2 to 10 th power root operations using a CORDIC based low complexity hardware system as claimed in any one of claims 1 to 3; the complex number z is expressed as: and z is m + j N, the root of the complex number z is calculated by the exponential form of the complex number, and N is an integer which is greater than or equal to 2 and less than or equal to 10.

5. A method for 2-10 th power root operation on complex numbers as claimed in claim 4, wherein the result processing unit performs shift and addition operations by using the formula of the sum and difference of two angles of the trigonometric function

Real and imaginary parts of (c).

6. A method for complex 2 to 10 th power root operation according to claim 5, wherein the root operation is performed according to N different values of d,

n roots, d is 0,1, …, N-1; when the value of d is determined,

and

is a constant and is calculated in advance and stored in a look-up table.

7. The method as claimed in claim 5, wherein the value of N, an integer, input per clock cycle, is dynamically changed from 2 to 10, and the result processing unit dynamically activates different result units according to the difference of N, input integers, and completes the computation of N roots in parallel.

8. The method of claim 7, wherein the result processing unit is a parallel result processing unit, and the result processing unit shares computing resources according to the similarity of the N root solving processes.

9. The method of claim 5, wherein the result convergence is achieved by performing X iterations on each CORDIC module through an X-stage pipeline architecture, the iterations including a positive iteration for determining the calculation accuracy and a negative iteration for expanding the calculation convergence range, and X is an integer greater than zero.

10. A method as claimed in claim 6, wherein constant values associated with N are stored in a look-up table, and corresponding calculated values are obtained from different integers N between 2 and 10.