WO2020206716A1 - 一种用于FPGA的并行Jacobi计算加速实现方法 - Google Patents

一种用于FPGA的并行Jacobi计算加速实现方法 Download PDF

Info

Publication number
WO2020206716A1
WO2020206716A1 PCT/CN2019/083494 CN2019083494W WO2020206716A1 WO 2020206716 A1 WO2020206716 A1 WO 2020206716A1 CN 2019083494 W CN2019083494 W CN 2019083494W WO 2020206716 A1 WO2020206716 A1 WO 2020206716A1
Authority
WO
WIPO (PCT)
Prior art keywords
diagonal
processing unit
elements
symbol
rotation angle
Prior art date
Application number
PCT/CN2019/083494
Other languages
English (en)
French (fr)
Inventor
陈积明
史治国
吴均峰
何倩雯
刘颖
孙优贤
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to US17/420,682 priority Critical patent/US20220100815A1/en
Publication of WO2020206716A1 publication Critical patent/WO2020206716A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/4806Computations with complex numbers
    • G06F7/4818Computations with complex numbers using coordinate rotation digital computer [CORDIC]

Definitions

  • the invention belongs to an FPGA internal data processing method in the field of signal processing technology, and relates to a parallel Jacobi calculation acceleration implementation method for FPGA.
  • the present invention is to solve the problem of proposing a parallel Jacobi calculation acceleration implementation method for FPGA.
  • the parallel Jacobi calculation method is designed to achieve a more excellent calculation processing effect inside the FPGA.
  • the technical problems of slow data processing and high resource consumption in FPGA are solved, and the goal of achieving parallel Jacobi calculation in one CORDIC algorithm cycle is small, and the FPGA on-chip resource consumption is smaller.
  • the present invention adopts a technical solution of the following steps:
  • the data of the n ⁇ n-dimensional matrix is input into the FPGA using parallel Jacobi calculation for rotation transformation processing, the parallel Jacobi calculation uses the CORDIC (coordinate rotation digital calculation) algorithm for plane rotation, and the two-dimensional xy coordinate system is established during the plane rotation;
  • CORDIC coordinate rotation digital calculation
  • P ij represents the processing unit in the i-th row and j-th column
  • a 2i, 2j represents the element in the 2i-th row and 2j- th column in the n ⁇ n-dimensional matrix
  • n represents the dimension of the matrix
  • n ⁇ n-dimensional matrix is a real symmetric matrix
  • only the upper right part is reserved for the allocation obtained according to the above processing, and the lower left part and the upper right part are diagonally symmetric.
  • the diagonal processing unit calculates the symbol set corresponding to the rotation angle 2 ⁇ and outputs it to the non-diagonal processing unit
  • k represents the ordinal number of the number of iterations
  • N represents the total number of iterations
  • N is the number of data bits used by the FPGA
  • ⁇ k represents the first symbol parameter of the kth iteration
  • ⁇ k represents the second symbol of the kth iteration Parameter
  • ⁇ 0 represents the initial value of the rotation angle (ie 2 ⁇ )
  • ⁇ k represents the remaining rotation angle after k iterations
  • ⁇ k-1 represents the angle parameter of the k-1 iteration
  • d 2 ⁇ ,k represents the kth iteration
  • ⁇ k represents the first symbol parameter of the kth iteration
  • ⁇ k represents the second symbol of the kth iteration Parameter
  • ⁇ 0 represents the initial value of the rotation angle (ie 2 ⁇ )
  • ⁇ k represents the remaining rotation angle after k iterations
  • ⁇ k-1 represents the angle parameter of the
  • d 2 ⁇ ,k is obtained by XORing the sign bits of ⁇ k-1 and ⁇ k-1 (the sign bit is the same, d 2 ⁇ , k is 1, and the sign bit is opposite, d 2 ⁇ , k is -1). to give ⁇ k-1 ⁇ k-1 2 k-1 by a shift operation, ⁇ k-1 to obtain ⁇ k-1 2 k-1 by the shift operation.
  • ⁇ k is obtained by subtracting ⁇ k-1 2 k-1 and ⁇ k-1
  • ⁇ k is obtained by adding ⁇ k-1 2 k-1 and ⁇ k-1
  • d 2 ⁇ , k -1 then by the ⁇ k ⁇ k-1 2 k-1 and ⁇ k-1 obtained for adding, ⁇ k by ⁇ k-1 2 k-1 ⁇ k-1 and subtracts Calculated.
  • the initial rotation angle corresponding to the non-diagonal element in the diagonal processing unit is ⁇ , which is calculated as:
  • a pq and a qp respectively represent the two non-diagonal elements initially included in the diagonal processing unit
  • a qp a pq
  • a pp and a pp respectively represent the diagonal elements initially included in the diagonal processing unit
  • ⁇ 0 represents the initial first symbol parameter
  • ⁇ 0 represents the initial second symbol parameter
  • the rotation angle corresponding to the non-diagonal element in the current diagonal processing unit is ⁇ , ⁇ 0 and ⁇ 0 are sent to the symbol set calculation module as initial values, and the symbol calculation module obtains the symbol set corresponding to the rotation angle 2 ⁇ through iteration ⁇ d 2 ⁇ ,k ⁇ .
  • the diagonal processing unit outputs the symbol set ⁇ d 2 ⁇ ,k ⁇ corresponding to the rotation angle 2 ⁇ obtained by its calculation to the non-diagonal processing unit in the same row and column as itself;
  • the d 2 ⁇ ,k obtained by each iteration in step 2) is used as the rotation symbol of the kth iteration in the CORDIC algorithm, instead of the step of calculating the rotation symbol after each iteration of the traditional CORDIC algorithm, the first coordinate to be rotated (2a pq ,a pp -a qq ) execute the CORDIC algorithm to rotate the plane at a rotation angle of 2 ⁇ ;
  • a'pp and a'qq represent the updated two diagonal elements in the diagonal processing unit, and y 1 represents the y-axis coordinate after the first coordinate to be rotated;
  • the non-diagonal processing unit P ij receives the symbol set output from the two diagonal processing units P ii and P jj , expressed as with Respectively represent the symbols corresponding to the rotation angle 2 ⁇ i and the rotation angle 2 ⁇ j in the k-th iteration, and use the following formula to calculate the two symbols respectively with Thus obtained two symbol sets with
  • the two symbol sets pass through each pair of symbols
  • the XOR operation and the data selector determine the symbol set of the rotation angle ⁇ l - ⁇ m , and the XOR operation result is 1, then take As Otherwise take 0 as Through each pair of symbols
  • the XOR operation and the data selector determine the symbol set of the rotation angle ⁇ l + ⁇ m . If the result of the XOR operation is 1, take As Otherwise take 0 as
  • C 2 is expressed as the second compensation factor
  • C 3 is expressed as the third compensation factor
  • the difference between the second and third compensation factors and 1 is already less than 2 -N+1 , and the accuracy of the N-bit signed fixed-point number is up to 2 -N+1 , so the second and third compensation remain
  • the factor can be directly regarded as 1, that is, no compensation is required.
  • x 2 and y 2 respectively represent the rotated coordinates of the second coordinate to be rotated
  • x 3 and y 3 respectively represent the coordinate after the third coordinate to be rotated
  • x 2 and y 3 obtain the updated value as shown in the formula through addition and shift operations x 3 and y 2 are added and shifted to get the updated value as shown in the formula x 3 and y 2 are subtracted and shifted to get the updated value as shown in the formula x 2 and y 3 are subtracted and shifted to get the updated value as shown in the formula
  • Jacobi calculates that all non-diagonal elements in the diagonal processing unit of the n ⁇ n dimensional matrix are updated once, and return to step 2) for the next processing and update, and repeat the above update process to make the n ⁇ n dimensional matrix
  • the n ⁇ n-dimensional matrix is the covariance matrix of the data collected by the antenna array or before image dimension reduction, and is a real symmetric matrix.
  • n of the n ⁇ n-dimensional matrix is an odd number, that is, an odd-dimensional matrix, expand the matrix to an even-dimensional matrix by adding the n+1th column and n+1th row.
  • the element values in the n+1th column and n+1th row of are all 0.
  • step 2) the k-th symbol calculated in step 2) is provided to step 3) and step 4) for the CORDIC algorithm to perform the k-th iteration, so the steps 2), 3), and 4) are performed simultaneously.
  • the CORDIC algorithm calculation results are simply combined to obtain the updated value of each processing unit element, and all processing unit elements can be performed at the same time.
  • the method of the present invention takes only one CORDIC cycle in one step for realizing parallel Jacobi calculation, which greatly reduces the calculation time and improves the calculation performance compared with the three CORDIC cycles of the prior method.
  • the data of the n ⁇ n-dimensional matrix targeted by the present invention is the data collected by the antenna array used for DOA estimation, or the covariance matrix used when the image data is reduced by the PCA algorithm.
  • the invention adopts a specially designed linear combination method to replace the bilateral rotation method in the existing parallel Jacobi calculation, and combines the use of the rotation angle symbol set and the combination of the two symbol sets to replace the step of calculating the rotation symbol in the existing CORDIC algorithm, which improves the parallel Jacobi calculation
  • the parallelism reduces the calculation time of each step in parallel Jacobi calculation, and can realize one step of parallel Jacobi calculation in one CORDIC cycle.
  • the invention effectively improves the realization speed of parallel Jacobi calculation on hardware, can realize one step of parallel Jacobi within one CORDIC algorithm cycle, makes up for the shortcomings of the traditional method, and consumes only one third of the traditional method.
  • the invention can use less FPGA resources, improve the internal calculation processing performance of the FPGA, can effectively improve the efficiency of eigenvalue decomposition realized on the FPGA, and has higher application value in actual projects.
  • Figure 1 is a structural diagram of a diagonal processing unit according to an embodiment of the present invention.
  • FIG. 2 is a structural diagram of a non-diagonal processing unit according to an embodiment of the present invention.
  • FIG. 3 is a structural diagram of a processing unit array according to an embodiment of the present invention.
  • Fig. 4 is a flowchart of a calculation method according to an embodiment of the present invention.
  • the FPGA implementation structure of the present invention is mainly divided into a diagonal processing unit and a non-diagonal processing unit.
  • the structure of the diagonal processing unit is shown in FIG. 1 and the structure of the non-diagonal processing unit is shown in FIG. 2.
  • the structure of the processing unit array is shown in Figure 3.
  • the algorithm execution flow is shown in Figure 4.
  • the specific implementation process of this example is implemented on the Xilinx Virtex-7 XC7VX690T FPGA chip.
  • the implementation specifically uses a four-element antenna array to collect the wireless signal transmitted by the drone, and the signal incidence direction is 0 degrees.
  • a 4 ⁇ 4 real symmetric covariance matrix calculated based on the four sets of data received by the quad antenna is denoted as A.
  • the processing unit is initialized. Each element in R r is allocated to the processing unit P ij . Each processing unit is connected to adjacent processing units through a data interface.
  • the diagonal processing unit calculates the symbol set corresponding to the rotation angle and outputs it to the non-diagonal processing unit.
  • the diagonal elements included in the diagonal processing unit be a pp and a pp .
  • the number of iterations is the same as that of the CORDIC algorithm, and the number of data bits used by the current system is 16.
  • the diagonal processing unit element is updated. Use to find the compensation factor.
  • Non-diagonal processing unit element update receives the symbol set output from the two diagonal processing units P ii and P jj , which is expressed as Calculate separately with
  • Processing element exchange between units After the element of each processing unit is updated, the symmetrical matrix element is also updated to the same value, and the updated element is exchanged with the elements of other processing units.
  • Jacobi calculates that all non-diagonal elements of the matrix are updated once by the diagonal processing unit. Repeat the above update process for many times to make the non-diagonal elements of the matrix gradually converge to 0, and the update ends when the user preset convergence accuracy is reached. , The parallel Jacobi calculation ends.
  • the non-diagonal elements of the matrix have reached the convergence condition (although the parallel Jacobi calculation is an algorithm that makes the diagonal elements close to 0, but in actual implementation, a finite fixed-point number is used to represent decimals, so non-diagonal elements can be Reaches 0, but also introduces error), at this time
  • the elements on the diagonal are the obtained eigenvalues.
  • the obtained characteristic value is used in the signal DOA (angle of arrival) estimation algorithm.
  • DOA angle of arrival
  • This embodiment provides the performance of the practical application of the present invention in terms of running time and FPGA resource consumption.
  • the CORDIC algorithm internally iterates 16 times. Considering the result compensation, the CORDIC algorithm cycle is 17 FPGA clock cycles. Considering that each step of the parallel Jacobi requires element exchange, it takes 1 clock cycle Therefore, the parallel Jacobi calculation acceleration implementation method of the present invention requires 18 clock cycles to realize one step of the parallel Jacobi.
  • the convergence condition is set to be that the absolute value of the maximum value of the non-diagonal elements of the covariance matrix is less than 0.001. The convergence condition was reached after 8 iterations, and it took 144 clock cycles.
  • the clock frequency used in this example is 250M, which takes 0.576 microseconds.
  • the traditional method of implementing parallel Jacobi needs to use the CORDIC algorithm to periodically solve the rotation angle in the diagonal processing unit, and then use the rotation angle obtained by the diagonal processing unit to update the diagonal processing unit elements using the CORDIC algorithm twice, which requires a total of 3 CORDIC Algorithm cycle, in the non-diagonal processing unit, you need to wait for the diagonal processing unit to solve the rotation angle, and then you need to use the CORDIC algorithm twice to update the non-diagonal processing unit elements.
  • the angles of the two rotations are: diagonal processing in the same row
  • Each processing unit works in parallel, and it takes at least 3 CORDIC algorithm cycles to realize one step of parallel Jacobi.
  • the CORDIC algorithm used in the present invention all work in parallel, and only one CORDIC algorithm cycle is required. Table 2 shows the comparison between the processing unit processing process of the present invention and the traditional method.
  • the present invention has the advantage of significantly improving the eigenvalue solving speed of the traditional method, and has high application value when the actual engineering needs to quickly realize the eigenvalue decomposition.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

一种用于FPGA的并行Jacobi计算加速实现方法。方法包括:将n×n维矩阵的数据输入到FPGA中利用并行Jacobi计算进行旋转变换处理,并行Jacobi计算中采用CORDIC算法进行平面旋转,处理单元初始化,对角处理单元计算旋转角度对应的符号集并输出给非对角处理单元,对角处理单元元素更新,非对角处理单元元素更新,处理单元间元素交换,在对每个处理单元的元素进行更新后,将更新后的处理单元之间的元素进行交换。

Description

一种用于FPGA的并行Jacobi计算加速实现方法 技术领域
本发明属于信号处理技术领域的一种FPGA内部数据处理方法,涉及了一种用于FPGA的并行Jacobi计算加速实现方法。
背景技术
雷达、无线通信、图像处理等诸多领域的许多算法都需要计算矩阵的特征值。例如,特征值的计算是子空间类DOA(Direction of Arrival,到达角度)估计算法和PCA(Principal Component Analysis,主成分分析)算法的关键步骤。
目前有大量计算特征值的算法,例如QR算法、LU分解算法、代数法等。代数法由于求根步骤的复杂度随矩阵维度大大上升,不适合大规模矩阵求特征值,而LU分解算法只能用于可逆矩阵求特征值。而且,尽管QR算法比串行Jacobi计算计算特征值的速度更快,但是已有学者证明Jacobi计算比QR算法更精确。Jacobi计算是通过一系列旋转将矩阵逐渐变换为一个近似对角矩阵的过程,矩阵的对角元素即矩阵特征值。此外,Jacobi计算由于对实对称矩阵进行特征值分解有其固有的并行性,使并行Jacobi计算(Jacobi计算的一种并行实现方法)在特征值分解的FPGA实现中得到了普遍应用。
目前,已有一些并行Jacobi计算的加速研究,但是大多加速方法未能做到在一个CORDIC算法周期内实现并行Jacobi计算的一步。现有的近似Jacobi计算虽然可以在一个CORDIC算法周期内实现并行Jacobi计算的一步,但由于是近似旋转,总共需要的旋转次数会增加,因此效果不佳。此外,由于FPGA总LUT(查找表)资源有限,现有算法实现时均并未考虑FPGA中LUT资源的消耗量。
发明内容
针对上述背景技术中存在的问题,本发明要解决的是提出了一种用于FPGA的并行Jacobi计算加速实现方法,对并行Jacobi计算方法在FPGA内部实现设计更优异的计算处理效果的方案,解决了FPGA内部数据处理较慢、资源消耗多的技术问题,达到在一个CORDIC算法周期内实现并行Jacobi计算的仅一步的目标小,且该FPGA片上资源消耗更小。
为了实现上述目的,本发明采用以下步骤的技术方案:
1)处理单元初始化
n×n维矩阵的数据输入到FPGA中利用并行Jacobi计算进行旋转变换处理,并行Jacobi计算中采用CORDIC(坐标旋转数字计算)算法进行平面旋转,平面旋转中建立二维xy坐标系;
FPGA(现场可编程门阵列)内分设有多个处理单元,多个处理单元阵列排布,每个处理单元与自身相邻的处理单元通过数据接口连接,进行数据交互和元素交换,将执行并行Jacobi计算的n×n维矩阵中各元素按以下公式分配到处理单元P ij中:
Figure PCTCN2019083494-appb-000001
其中,P ij表示第i行第j列的处理单元,a 2i,2j表示n×n维矩阵中第2i行第2j列的元素,n表示矩阵的维度;
并且,下标i=j的处理单元P ij为对角处理单元,否则为非对角处理单元;处理单元P ij中下标满足2i=2j和2i-1=2j-1的元素为对角元素,否则为非对角元素;
由于n×n维矩阵为实对称矩阵,按照上述处理获得分配仅保留右上部分,左下部分和右上部分以对角线对称。
2)对角处理单元计算旋转角度2θ对应的符号集并输出给非对角处理单元
用以下公式迭代求出CORDIC算法旋转角度2θ对应符号集{d 2θ,k},k=1,2,...,N,迭代总次数与CORDIC算法的迭代总次数相同:
Figure PCTCN2019083494-appb-000002
Figure PCTCN2019083494-appb-000003
其中,k表示迭代次数的序数,N表示迭代总次数,N取为FPGA所采用的数据位数;α k表示第k次迭代的第一符号参数,β k表示第k次迭代的第二符号参数,θ 0表示旋转角度初始值(即2θ),θ k表示经k次迭代后的剩余旋转角度,φ k-1表示第k-1次迭代的角度参数,d 2θ,k表示第k次迭代下旋转角度2θ对应的符号;
具体地,在符号计算模块中,d 2θ,k通过α k-1和β k-1的符号位进行异或运算得到(符号位相同则d 2θ,k为1,符号位相反则d 2θ,k为-1)。α k-1通过移位运算得到α k-12 k-1,β k-1通过移位运算得到β k-12 k-1。若d 2θ,k为1,则α k通过α k-12 k-1和β k-1进行减法运算得到,β k通过β k-12 k-1和α k-1进行加法运算得到;若d 2θ,k为-1,则α k通过 α k-12 k-1和β k-1进行加法运算得到,β k通过β k-12 k-1和α k-1进行减法运算得到。
迭代计算开始,与对角处理单元中的非对角元素所对应的初始旋转角度为θ,计算为:
Figure PCTCN2019083494-appb-000004
其中,a pq、a qp分别表示对角处理单元中初始包含的两个非对角元素,且a qp=a pq,a pp、a pp分别表示对角处理单元中初始包含的对角元素;α 0表示初始的第一符号参数,β 0表示初始的第二符号参数;
对角处理单元的对角元素a pp和a qq通过减法运算得到β 0=a pp-a qq,非对角元素a pq通过移位运算得到α 0=2a pq,设在并行Jacobi计算中,与当前对角处理单元中的非对角元素所对应的旋转角度为θ,β 0和α 0作为初始值送入符号集计算模块,而符号计算模块通过迭代求出旋转角度2θ对应的符号集{d 2θ,k}。
最后,对角处理单元将自身计算获得的旋转角度2θ对应符号集{d 2θ,k}输出到与自身处于同一行和同一列的非对角处理单元;
3)对角处理单元元素更新
由步骤2)中每次迭代求得的d 2θ,k作为CORDIC算法中第k次迭代的旋转符号,代替传统CORDIC算法每次迭代后计算旋转符号的步骤,对第一待旋转坐标(2a pq,a pp-a qq)执行CORDIC算法以旋转角度2θ进行平面旋转;
步骤2)所有迭代完成后,将最终的平面旋转结果乘以第一补偿因子,得到旋转后的y坐标,即y 1=2a pqsin2θ+(a pp-a qq)cos2θ,第一补偿因子用以下公式求出:
Figure PCTCN2019083494-appb-000005
其中,C 1表示第一补偿因子;
然后对对角处理单元中的对角线元素用以下公式更新,并将非对角线元素置0:
Figure PCTCN2019083494-appb-000006
Figure PCTCN2019083494-appb-000007
其中,a' pp、a' qq表示对角处理单元中更新后的两个对角线元素,y 1表示第一待旋转坐标旋转后的y轴坐标;
4)非对角处理单元元素更新
4.1)非对角处理单元P ij接收来自两个对角处理单元P ii、P jj输出的符号集,表示为
Figure PCTCN2019083494-appb-000008
Figure PCTCN2019083494-appb-000009
分别表示第k次迭代下旋转角度2θ i和旋转角度2θ j对应的符号,用以下公式分别计算两个符号
Figure PCTCN2019083494-appb-000010
Figure PCTCN2019083494-appb-000011
从而获得了两 个符号集
Figure PCTCN2019083494-appb-000012
Figure PCTCN2019083494-appb-000013
Figure PCTCN2019083494-appb-000014
Figure PCTCN2019083494-appb-000015
其中,
Figure PCTCN2019083494-appb-000016
Figure PCTCN2019083494-appb-000017
分别表示旋转角度θ ij和旋转角度θ ij对应的符号;2θ i和2θ j分别表示两个对角处理单元P ii、P jj的非对角元素对应的旋转角度的二倍角;
具体地,两符号集通过对每一对符号
Figure PCTCN2019083494-appb-000018
的异或运算和数据选择器确定旋转角度θ lm的符号集,异或运算结果为1则取
Figure PCTCN2019083494-appb-000019
作为
Figure PCTCN2019083494-appb-000020
否则取0作为
Figure PCTCN2019083494-appb-000021
通过对每一对符号
Figure PCTCN2019083494-appb-000022
的同或运算和数据选择器确定旋转角度θ lm的符号集,同或运算结果为1则取
Figure PCTCN2019083494-appb-000023
作为
Figure PCTCN2019083494-appb-000024
否则取0作为
Figure PCTCN2019083494-appb-000025
4.2)
Figure PCTCN2019083494-appb-000026
有{-1,0,1}三种取值,用以下公式计算由两个符号集
Figure PCTCN2019083494-appb-000027
Figure PCTCN2019083494-appb-000028
的前
Figure PCTCN2019083494-appb-000029
个符号所有可能构成的符号组合对应的第二、第三补偿因子的取值,一个符号组合是由
Figure PCTCN2019083494-appb-000030
个符号构成,以各个不同符号组合对应的第二、第三补偿因子取值建立查找表数据,以前
Figure PCTCN2019083494-appb-000031
符号中各符号绝对值为查找地址,用Block Memory(块随机存储器)生成查找表,查找表的地址位数取
Figure PCTCN2019083494-appb-000032
数据的深度
Figure PCTCN2019083494-appb-000033
Figure PCTCN2019083494-appb-000034
Figure PCTCN2019083494-appb-000035
其中,C 2表示为第二补偿因子,C 3表示为第三补偿因子;
由于CORDIC算法迭代次数超过
Figure PCTCN2019083494-appb-000036
(
Figure PCTCN2019083494-appb-000037
向上取整)时,第二、第三补偿因子与1的差值已经小于2 -N+1,而N位有符号定点数精度最高为2 -N+1,因此剩余第二、第三补偿因子可以直接视为1,即无需补偿。
4.3)对于非对角处理单元,将非对角处理单元包含的四个元素表示为
Figure PCTCN2019083494-appb-000038
由求得的
Figure PCTCN2019083494-appb-000039
作为CORDIC算法中第k次迭代的旋转符号,对第二待旋转坐标
Figure PCTCN2019083494-appb-000040
执行CORDIC算法以旋转角度θ ij进行平面旋转,将平面旋转结果乘以第二补偿因子,第二补偿因子取值由步骤4.2)的查找表进行 查表取得,得到旋转后的坐标,表示为:
Figure PCTCN2019083494-appb-000041
其中,x 2和y 2分别表示第二待旋转坐标旋转后的坐标;
由求得的
Figure PCTCN2019083494-appb-000042
作为CORDIC算法中第k次迭代的旋转符号,对第三待旋转坐标
Figure PCTCN2019083494-appb-000043
执行CORDIC算法以旋转角度θ ij进行平面旋转,将平面旋转结果乘以第三补偿因子,第三补偿因子取值由步骤4.2)的查找表进行查表取得,得到旋转后的坐标,表示为:
Figure PCTCN2019083494-appb-000044
其中,x 3和y 3分别表示第三待旋转坐标旋转后的坐标;
4.4)然后采用以下公式对非对角处理单元中的元素进行更新:
Figure PCTCN2019083494-appb-000045
Figure PCTCN2019083494-appb-000046
Figure PCTCN2019083494-appb-000047
Figure PCTCN2019083494-appb-000048
其中,
Figure PCTCN2019083494-appb-000049
Figure PCTCN2019083494-appb-000050
分别表示非对角处理单元包含的四个元素;
具体的,x 2和y 3通过加法运算和移位运算,得到如公式所示的更新值
Figure PCTCN2019083494-appb-000051
x 3和y 2通过加法运算和移位运算,得到如公式所示的更新值
Figure PCTCN2019083494-appb-000052
x 3和y 2通过减法运算和移位运算,得到如公式所示的更新值
Figure PCTCN2019083494-appb-000053
x 2和y 3通过减法运算和移位运算,得到如公式所示的更新值
Figure PCTCN2019083494-appb-000054
5)处理单元间元素交换
在对每个处理单元的元素进行更新后,与之对称的矩阵元素也更新为相同值,将更新后的处理单元之间的元素进行交换:
5.A)针对对角处理单元中的对角元素进行交换
设当前对角处理单元P ii包含对角元素
Figure PCTCN2019083494-appb-000055
Figure PCTCN2019083494-appb-000056
然后:
针对对角元素
Figure PCTCN2019083494-appb-000057
i表示对角处理单元的行列序数,若i=1,则对角元素
Figure PCTCN2019083494-appb-000058
不变,若i=2,则对角元素
Figure PCTCN2019083494-appb-000059
的值更换为对角元素
Figure PCTCN2019083494-appb-000060
的值,若i>2,则对角元素
Figure PCTCN2019083494-appb-000061
的值更换为对角元素
Figure PCTCN2019083494-appb-000062
的值;
针对对角元素
Figure PCTCN2019083494-appb-000063
Figure PCTCN2019083494-appb-000064
则对角元素
Figure PCTCN2019083494-appb-000065
的值换为
Figure PCTCN2019083494-appb-000066
的值;若
Figure PCTCN2019083494-appb-000067
则对角元素
Figure PCTCN2019083494-appb-000068
的值更换为对角元素
Figure PCTCN2019083494-appb-000069
的值;
5.B)针对对角处理单元中的非对角元素和非对角处理单元中的元素进行交换,均采用以下方式更换位置:将对角处理单元中的非对角元素和非对角处理 单元中的元素进行移动位置,可以跨处理单元地移动到其他处理单元,使元素的行下标与移动后处于相同行的步骤5.A)交换后的对角元素的行号相同,且元素的列下标与移动后处于相同列的步骤5.A)交换后的对角元素的列号相同;
6)交换后Jacobi计算将n×n维矩阵所有对角处理单元中的非对角元素都经过更新一次,返回步骤2)进行下一处理和更新,不断重复以上更新过程使n×n维矩阵的非对角元素逐渐收敛到0,达到预设收敛精度后结束更新,并行Jacobi计算结束。
所述的n×n维矩阵为天线阵列采集到的或者图像降维前的数据的协方差矩阵,为实对称矩阵。
所述步骤1)中,若n×n维矩阵的n为奇数,即奇数维的矩阵,则通过添加第n+1列和第n+1行来将矩阵拓展成偶数维的矩阵,将添加的第n+1列和第n+1行的元素数值全部取0。
本发明在步骤2)计算的第k个符号提供给步骤3)和步骤4)CORDIC算法进行第k次迭代,因此所述步骤2)、步骤3)、步骤4)同时进行。
CORDIC算法计算结果经过简单组合即得到各处理单元元素更新值,所有处理单元的元素都可以同时进行。本发明方法实现并行Jacobi计算的一步耗费的时间仅为一个CORDIC周期,相比现有方法耗时的三个CORDIC周期大大减少了计算时间,提高了计算性能。
本发明针对的n×n维矩阵的数据为天线阵列采集到的数据在进行DOA估计时所用的,或者图像数据使用PCA算法进行降维时所使用的协方差矩阵。
本发明的有益效果是:
本发明采用特殊设计的线性组合方法取代现有并行Jacobi计算当中的双边旋转方法,结合利用旋转角度符号集和两符号集的组合取代现有CORDIC算法中计算旋转符号的步骤,提高了并行Jacobi计算的并行性,减少了并行Jacobi计算中每步的计算时间,能够在一个CORDIC周期内实现并行Jacobi计算的一步。
本发明有效地提高了并行Jacobi计算在硬件上的实现速度,能在一个CORDIC算法周期内实现并行Jacobi的一步,弥补了传统方法的不足,耗时仅为传统方法的三分之一。
本发明能使用较少的FPGA资源,提高FPGA内部计算处理性能,能够有效地提高特征值分解在FPGA上实现的效率,在实际工程中有较高的应用价值。
附图说明
图1为本发明实施例对角处理单元结构图;
图2为本发明实施例非对角处理单元结构图;
图3为本发明实施例处理单元阵列结构图;
图4为本发明实施例计算方法流程图。
具体实施方式
以下结合附图和具体实施例对本发明的实施作如下详述。
本发明的FPGA实现结构主要分为对角处理单元和非对角处理单元,对角处理单元结构如图1所示,非对角处理单元结构如图2所示。处理单元阵列结构图如图3所示。算法执行流程如图4所示。
本发明实施例及其实施过程如下:
本实例具体实施过程在Xilinx Virtex-7 XC7VX690T FPGA芯片上实现,实现具体采用的是四元天线阵列采集无人机发射的无线信号,信号入射方向为0度。根据四元天线接收的四组数据计算得到的一个4×4的实对称协方差矩阵,表示为A。
采用16位定点数,对于
Figure PCTCN2019083494-appb-000070
求特征值,具体包括以下步骤:
(1)处理单元初始化。将R r中各元素按分配到处理单元P ij中。每个处理单元与相邻的处理单元通过数据接口连接。下标满足i=j的处理单元为对角处理单元,否则为非对角处理单元。下标满足2i=2j和2i-1=2j-1的矩阵元素为对角元素,否则为非对角元素。
(2)对角处理单元计算旋转角度对应的符号集并输出给非对角处理单元。设对角处理单元中包含的非对角元素为a pq,a qp,且a qp=a pq。设对角处理单元中包含的对角元素为a pp,a pp。令α 0=2a pq,β 0=a pp-a qq。设与当前对角处理单元中的非对角元素所对应的旋转角度为θ。用迭代求出CORDIC算法旋转角度2θ对应符号集d 2θ,k,k=1,2,...,16。迭代次数与CORDIC算法迭代次数相同,取当前系统采用的数据位数16。
(3)对角处理单元元素更新。用求出补偿因子。由步骤(2)中求得的d 2θ,k作为CORDIC算法中第k次迭代的旋转符号,代替传统CORDIC算法每次迭代后计算旋转符号的步骤,对(2a pq,a pp-a qq)执行CORDIC算法旋转2θ,结果乘以补偿因子,得到旋转后的y坐标,即y 1=2a pqsin2θ+(a pp-a qq)cos2θ,对角处理单元中的对角线元素更新。并对非对角线元素置0。
(4)非对角处理单元元素更新。非对角处理单元P ij接收来自两个对角处理单元P ii、P jj输出的符号集,表示为
Figure PCTCN2019083494-appb-000071
用分别计算
Figure PCTCN2019083494-appb-000072
Figure PCTCN2019083494-appb-000073
Figure PCTCN2019083494-appb-000074
有{-1,0,1}三种取值,用计算出符号集
Figure PCTCN2019083494-appb-000075
前16个符号所有可能取值组合对应的补偿因子的取值,以补偿因子取值为查找表数据,以前16符号集中各符号绝对值为查找地址,用Block Memory生成查找表。由于CORDIC算法迭代次数超过8时,补偿因子与1的差值已经小于2 -7,而8位数据精度最高为2 -7,因此剩余补偿因子可以直接视为1,即无需补偿。查找表的地址位数取8,数据深度2 8。本实例查找表如表1所示。
Figure PCTCN2019083494-appb-000076
表1补偿值查找表
令当前非对角处理单元包含的矩阵元素为
Figure PCTCN2019083494-appb-000077
Figure PCTCN2019083494-appb-000078
作为CORDIC算法中第k次迭代的旋转符号。对
Figure PCTCN2019083494-appb-000079
执行CORDIC算法旋转θ lm,补偿因子取值由查表取得,对结果乘以补偿因子,得到旋转后的坐标。
Figure PCTCN2019083494-appb-000080
作为CORDIC算法中第k次迭代的旋转符号。对
Figure PCTCN2019083494-appb-000081
执行CORDIC算法旋转θ lm,补偿因子取值由查表取得,对结果乘以补偿因子,得到旋转后的坐标。
非对角处理单元中元素更新。
(5)处理单元间元素交换。在对每个处理单元的元素进行更新后,与之对称的矩阵元素也更新为相同的值,将更新后的元素和其他处理单元的元素进行交换。
然后返回步骤2、3、4进行新一轮的计算和更新。经过3次交换后Jacobi计算将矩阵所有非对角元素都经过对角处理单元更新一次,重复多次以上更新过程使矩阵的非对角元素逐渐收敛到0,达到用户预设收敛精度后结束更新,并行Jacobi计算结束。
具体结果如下:
第1轮:
Figure PCTCN2019083494-appb-000082
Figure PCTCN2019083494-appb-000083
更新后为
Figure PCTCN2019083494-appb-000084
元素交换后为
Figure PCTCN2019083494-appb-000085
第2轮:
Figure PCTCN2019083494-appb-000086
Figure PCTCN2019083494-appb-000087
更新后为
Figure PCTCN2019083494-appb-000088
交换后为
Figure PCTCN2019083494-appb-000089
第8轮:
Figure PCTCN2019083494-appb-000090
更新后为
Figure PCTCN2019083494-appb-000091
可见矩阵的非对角元素已经达到收敛条件(虽然并行Jacobi计算是一个使 对角元素趋近于0的算法,但是由于实际实现中采用有限位的定点数来表示小数,故非对角元素可达到0,但也引入了误差),此时
Figure PCTCN2019083494-appb-000092
对角线上的元素即求得的特征值。将所求得特征值用于信号DOA(到达角度)估计算法中,从下图可见,MUSIC(多信号分类)算法的功率谱函数在0度有个峰值,可见本发明实现了正确的功能。
Figure PCTCN2019083494-appb-000093
本实施例从运行时间、FPGA资源消耗两个方面给出本发明实际应用的性能。
运行时间:由于数据采用16位定点数,因此CORDIC算法内部迭代共16次,考虑结果补偿,CORDIC算法周期为17个FPGA时钟周期,考虑并行Jacobi每步之间需进行元素交换占用1个时钟周期,因此本发明的并行Jacobi计算加速实现方法实现并行Jacobi的一步共需要18个时钟周期。在本例中,设置收敛条件为协方差矩阵的非对角元素最大值的绝对值小于0.001。经过8次迭代达到收敛条件,用时144个时钟周期。在本例使用的时钟频率为250M,用时0.576微秒。
资源消耗:实现本例的Verilog程序在Vivado 2017.1软件平台上进行综合,结果表明本例共消耗LUT(查找表)2360个,消耗REG(寄存器)688个,分别占总资源的0.54%和0.79‰,可见,该设计仅占用少量的FPGA资源。
传统实现并行Jacobi的方法在对角处理单元需要使用CORDIC算法周期求解旋转角度,接着用该对角处理单元求得的旋转角度先后使用两次CORDIC算法更新对角处理单元元素,共需要3个CORDIC算法周期,在非对角处理单元需要等待对角处理单元求解旋转角度,接着也需连续使用两次CORDIC算法更新非对角处理单元元素,两次旋转的角度分别为:同一行的对角处理单元传递的旋转角度和同一列的对角处理单元传递的旋转角度旋转。各处理单元并行工作,实现并行Jacobi的一步共需要至少3个CORDIC算法周期。而本发明使用的CORDIC算法全部并行工作,仅需一次CORDIC算法周期。本发明与传统方法的处理单元处理过程对比如表2所示。
表2本发明与现有并行Jacobi方法的处理单元处理过程对比
Figure PCTCN2019083494-appb-000094
由此可见,本发明具有显著提高传统方法特征值求解速度的优势,在实际工程需要快速实现特征值分解时具有较高应用价值。
本案由熟悉本领域技术的人员根据说明书和附图内容作出的等效结构变换,均包含在本发明的专利范围内。

Claims (7)

  1. 一种用于FPGA的并行Jacobi计算加速实现方法,其特征在于:包括以下方面:
    1)处理单元初始化
    n×n维矩阵的数据输入到FPGA中利用并行Jacobi计算进行旋转变换处理,并行Jacobi计算中采用CORDIC算法进行平面旋转,平面旋转中建立二维xy坐标系;
    FPGA内分设有多个处理单元,多个处理单元阵列排布,每个处理单元与自身相邻的处理单元通过数据接口连接,进行数据交互和元素交换,将执行并行Jacobi计算的n×n维矩阵中各元素按以下公式分配到处理单元P ij中:
    Figure PCTCN2019083494-appb-100001
    其中,P ij表示第i行第j列的处理单元,a 2i,2j表示n×n维矩阵中第2i行第2j列的元素,n表示矩阵的维度;
    并且,下标i=j的处理单元P ij为对角处理单元,否则为非对角处理单元;处理单元P ij中下标满足2i=2j和2i-1=2j-1的元素为对角元素,否则为非对角元素;
    2)对角处理单元计算旋转角度2θ对应的符号集并输出给非对角处理单元
    用以下公式迭代求出CORDIC算法旋转角度2θ对应符号集{d 2θ,k},k=1,2,...,N,迭代总次数与CORDIC算法的迭代总次数相同:
    Figure PCTCN2019083494-appb-100002
    Figure PCTCN2019083494-appb-100003
    其中,k表示迭代次数的序数,N表示迭代总次数,N取为FPGA所采用的数据位数;α k表示第k次迭代的第一符号参数,β k表示第k次迭代的第二符号参数,θ 0表示旋转角度初始值(即2θ),θ k表示经k次迭代后的剩余旋转角度,φ k-1表示第k-1次迭代的角度参数,d 2θ,k表示第k次迭代下旋转角度2θ对应的符号;
    最后,对角处理单元将自身计算获得的旋转角度2θ对应符号集{d 2θ,k}输出到与自身处于同一行和同一列的非对角处理单元;
    3)对角处理单元元素更新
    由步骤2)中每次迭代求得的d 2θ,k作为CORDIC算法中第k次迭代的旋转符号,对第一待旋转坐标(2a pq,a pp-a qq)执行CORDIC算法以旋转角度2θ进行平面旋转;
    步骤2)所有迭代完成后,将最终的平面旋转结果乘以第一补偿因子,得到旋转后的y坐标,即y 1=2a pqsin 2θ+(a pp-a qq)cos 2θ,第一补偿因子用以下公式求出:
    Figure PCTCN2019083494-appb-100004
    其中,C 1表示第一补偿因子;
    然后对对角处理单元中的对角线元素用以下公式更新,并将非对角线元素置0:
    Figure PCTCN2019083494-appb-100005
    Figure PCTCN2019083494-appb-100006
    其中,a' pp、a' qq表示对角处理单元中更新后的两个对角线元素,y 1表示第一待旋转坐标旋转后的y轴坐标;
    4)非对角处理单元元素更新
    5)处理单元间元素交换
    6)交换后Jacobi计算将n×n维矩阵所有对角处理单元中的非对角元素都经过更新一次,返回步骤2)进行下一处理和更新,不断重复以上更新过程使n×n维矩阵的非对角元素逐渐收敛到0,达到预设收敛精度后结束更新,并行Jacobi计算结束。
  2. 根据权利要求1所述的一种用于FPGA的并行Jacobi计算加速实现方法,其特征在于:所述步骤2)中在迭代计算开始,与对角处理单元中的非对角元素所对应的初始旋转角度为θ,计算为:
    Figure PCTCN2019083494-appb-100007
    α 0=2a pq,β 0=a pp-a qq
    其中,a pq、a qp分别表示对角处理单元中初始包含的两个非对角元素,且a qp=a pq,a pp、a pp分别表示对角处理单元中初始包含的对角元素;α 0表示初始的第一符号参数,β 0表示初始的第二符号参数;
  3. 根据权利要求1所述的一种用于FPGA的并行Jacobi计算加速实现方法,其特征在于:所述的n×n维矩阵为天线阵列采集到的或者图像降维前的数据的协方差矩阵,为实对称矩阵。
  4. 根据权利要求1所述的一种用于FPGA的并行Jacobi计算加速实现方法, 其特征在于:所述步骤1)中,若n×n维矩阵的n为奇数,则通过添加第n+1列和第n+1行来将矩阵拓展成偶数维的矩阵,将添加的第n+1列和第n+1行的元素数值全部取0。
  5. 根据权利要求1所述的一种用于FPGA的并行Jacobi计算加速实现方法,其特征在于:所述步骤4)具体为:
    4.1)非对角处理单元P ij接收来自两个对角处理单元P ii、P jj输出的符号集,表示为
    Figure PCTCN2019083494-appb-100008
    Figure PCTCN2019083494-appb-100009
    分别表示第k次迭代下旋转角度2θ i和旋转角度2θ j对应的符号,用以下公式分别计算两个符号
    Figure PCTCN2019083494-appb-100010
    Figure PCTCN2019083494-appb-100011
    从而获得了两个符号集
    Figure PCTCN2019083494-appb-100012
    Figure PCTCN2019083494-appb-100013
    Figure PCTCN2019083494-appb-100014
    Figure PCTCN2019083494-appb-100015
    其中,
    Figure PCTCN2019083494-appb-100016
    Figure PCTCN2019083494-appb-100017
    分别表示旋转角度θ ij和旋转角度θ ij对应的符号;2θ i和2θ j分别表示两个对角处理单元P ii、P jj的非对角元素对应的旋转角度的二倍角;
    4.2)用以下公式计算由两个符号集
    Figure PCTCN2019083494-appb-100018
    Figure PCTCN2019083494-appb-100019
    的前
    Figure PCTCN2019083494-appb-100020
    个符号所有可能构成的符号组合对应的第二、第三补偿因子的取值,一个符号组合是由
    Figure PCTCN2019083494-appb-100021
    个符号构成,以各个不同符号组合对应的第二、第三补偿因子取值建立查找表数据,以前
    Figure PCTCN2019083494-appb-100022
    符号中各符号绝对值为查找地址,用Block Memory(块随机存储器)生成查找表,查找表的地址位数取
    Figure PCTCN2019083494-appb-100023
    数据的深度
    Figure PCTCN2019083494-appb-100024
    Figure PCTCN2019083494-appb-100025
    Figure PCTCN2019083494-appb-100026
    其中,C 2表示为第二补偿因子,C 3表示为第三补偿因子;
    4.3)对于非对角处理单元,将非对角处理单元包含的四个元素表示为
    Figure PCTCN2019083494-appb-100027
    由求得的
    Figure PCTCN2019083494-appb-100028
    作为CORDIC算法中第k次迭代的旋转符号,对第二待旋转坐标
    Figure PCTCN2019083494-appb-100029
    执行CORDIC算法以旋转角度θ ij进行平面旋转,将平面旋转结果乘以第二补偿因子,第二补偿因子取值由步骤4.2)的查找表进行查表取得,得到旋转后的坐标,表示为:
    Figure PCTCN2019083494-appb-100030
    其中,x 2和y 2分别表示第二待旋转坐标旋转后的坐标;
    由求得的
    Figure PCTCN2019083494-appb-100031
    作为CORDIC算法中第k次迭代的旋转符号,对第三待旋转坐标
    Figure PCTCN2019083494-appb-100032
    执行CORDIC算法以旋转角度θ ij进行平面旋转,将平面旋转结果乘以第三补偿因子,第三补偿因子取值由步骤4.2)的查找表进行查表取得,得到旋转后的坐标,表示为:
    Figure PCTCN2019083494-appb-100033
    其中,x 3和y 3分别表示第三待旋转坐标旋转后的坐标;
    4.4)然后采用以下公式对非对角处理单元中的元素进行更新:
    Figure PCTCN2019083494-appb-100034
    Figure PCTCN2019083494-appb-100035
    Figure PCTCN2019083494-appb-100036
    Figure PCTCN2019083494-appb-100037
    其中,
    Figure PCTCN2019083494-appb-100038
    Figure PCTCN2019083494-appb-100039
    分别表示非对角处理单元包含的四个元素。
  6. 根据权利要求1所述的一种用于FPGA的并行Jacobi计算加速实现方法,其特征在于:所述步骤5)在对每个处理单元的元素进行更新后,将更新后的处理单元之间的元素进行交换:
    5.A)针对对角处理单元中的对角元素进行交换,设当前对角处理单元P ii包含对角元素
    Figure PCTCN2019083494-appb-100040
    Figure PCTCN2019083494-appb-100041
    然后:针对对角元素
    Figure PCTCN2019083494-appb-100042
    i表示对角处理单元的行列序数,若i=1,则对角元素
    Figure PCTCN2019083494-appb-100043
    不变,若i=2,则对角元素
    Figure PCTCN2019083494-appb-100044
    的值更换为对角元素
    Figure PCTCN2019083494-appb-100045
    的值,若i>2,则对角元素
    Figure PCTCN2019083494-appb-100046
    的值更换为对角元素
    Figure PCTCN2019083494-appb-100047
    的值;针对对角元素
    Figure PCTCN2019083494-appb-100048
    Figure PCTCN2019083494-appb-100049
    则对角元素
    Figure PCTCN2019083494-appb-100050
    的值换为
    Figure PCTCN2019083494-appb-100051
    的值;若
    Figure PCTCN2019083494-appb-100052
    则对角元素
    Figure PCTCN2019083494-appb-100053
    的值更换为对角元素
    Figure PCTCN2019083494-appb-100054
    的值;
    5.B)针对对角处理单元中的非对角元素和非对角处理单元中的元素进行交换,均采用以下方式更换位置:将对角处理单元中的非对角元素和非对角处理单元中的元素进行移动位置,使元素的行下标与移动后处于相同行的步骤5.A)交换后的对角元素的行号相同,且元素的列下标与移动后处于相同列的步骤5.A)交换后的对角元素的列号相同。
  7. 根据权利要求1所述的一种用于FPGA的并行Jacobi计算加速实现方法,其特征在于:所述步骤2)、步骤3)、步骤4)同时进行。
PCT/CN2019/083494 2019-04-10 2019-04-19 一种用于FPGA的并行Jacobi计算加速实现方法 WO2020206716A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/420,682 US20220100815A1 (en) 2019-04-10 2019-04-19 Method of realizing accelerated parallel jacobi computing for fpga

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910285351.4A CN110110285B (zh) 2019-04-10 2019-04-10 一种用于FPGA的并行Jacobi计算加速实现方法
CN201910285351.4 2019-04-10

Publications (1)

Publication Number Publication Date
WO2020206716A1 true WO2020206716A1 (zh) 2020-10-15

Family

ID=67484004

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/083494 WO2020206716A1 (zh) 2019-04-10 2019-04-19 一种用于FPGA的并行Jacobi计算加速实现方法

Country Status (3)

Country Link
US (1) US20220100815A1 (zh)
CN (1) CN110110285B (zh)
WO (1) WO2020206716A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859035B (zh) * 2020-08-12 2022-02-18 华控清交信息科技(北京)有限公司 数据处理方法及装置
CN112015369B (zh) * 2020-08-25 2022-09-16 湖南艾科诺维科技有限公司 基于fpga的信号处理方法、电子设备和存储介质
CN112596701B (zh) * 2021-03-05 2021-06-01 之江实验室 基于单边雅克比奇异值分解的fpga加速实现方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101438277A (zh) * 2004-11-15 2009-05-20 高通股份有限公司 用cordic处理器对mimo信道相关矩阵进行本征值分解
CN101847086A (zh) * 2010-05-14 2010-09-29 清华大学 一种基于循环雅克比的实对称阵特征分解装置
US20110213606A1 (en) * 2009-09-01 2011-09-01 Aden Seaman Apparatus, methods and systems for parallel power flow calculation and power system simulation
CN103294649A (zh) * 2013-05-23 2013-09-11 东南大学 双边cordic运算单元及基于该运算单元的并行雅克比埃尔米特阵特征分解方法和实现电路
CN106919537A (zh) * 2017-03-07 2017-07-04 电子科技大学 一种基于FPGA的Jacobi变换的高效实现方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100385249C (zh) * 2005-10-18 2008-04-30 电子科技大学 一种利用阵列天线进行波达方向估计的方法
CN106850017A (zh) * 2017-03-06 2017-06-13 东南大学 基于并行gs迭代的大规模mimo检测算法及硬件架构
CN106940689A (zh) * 2017-03-07 2017-07-11 电子科技大学 基于Jacobi迭代算法的高精度矩阵特征值分解实现方法
CN107450045B (zh) * 2017-07-13 2021-10-12 中国人民解放军空军空降兵学院 基于focuss二次加权算法的doa估计方法
CN108228536B (zh) * 2018-02-07 2021-03-23 成都航天通信设备有限责任公司 使用FPGA实现Hermitian矩阵分解的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101438277A (zh) * 2004-11-15 2009-05-20 高通股份有限公司 用cordic处理器对mimo信道相关矩阵进行本征值分解
US20110213606A1 (en) * 2009-09-01 2011-09-01 Aden Seaman Apparatus, methods and systems for parallel power flow calculation and power system simulation
CN101847086A (zh) * 2010-05-14 2010-09-29 清华大学 一种基于循环雅克比的实对称阵特征分解装置
CN103294649A (zh) * 2013-05-23 2013-09-11 东南大学 双边cordic运算单元及基于该运算单元的并行雅克比埃尔米特阵特征分解方法和实现电路
CN106919537A (zh) * 2017-03-07 2017-07-04 电子科技大学 一种基于FPGA的Jacobi变换的高效实现方法

Also Published As

Publication number Publication date
CN110110285A (zh) 2019-08-09
CN110110285B (zh) 2020-05-22
US20220100815A1 (en) 2022-03-31

Similar Documents

Publication Publication Date Title
WO2020206716A1 (zh) 一种用于FPGA的并行Jacobi计算加速实现方法
CN110050256B (zh) 用于神经网络实现的块浮点
US9318813B2 (en) Signal processing block for a receiver in wireless communication
CN110222307B (zh) 基于fpga的实对称矩阵的特征值分解的并行实现方法
CN110361691B (zh) 基于非均匀阵列的相干信源doa估计fpga实现方法
Shabany et al. A Low-Latency Low-Power QR-Decomposition ASIC Implementation in 0.13$\mu {\rm m} $ CMOS
CN109739470B (zh) 一种基于2型双曲cordic任意指数函数的计算系统
Meher et al. Low-Latency, Low-Area, and Scalable Systolic-Like Modular Multipliers for $ GF (2^{m}) $ Based on Irreducible All-One Polynomials
CN106940689A (zh) 基于Jacobi迭代算法的高精度矩阵特征值分解实现方法
Li et al. Study of CORDIC algorithm based on FPGA
CN112596701A (zh) 基于单边雅克比奇异值分解的fpga加速实现方法
Guenther et al. A scalable, multimode SVD precoding ASIC based on the cyclic Jacobi method
Liu et al. A novel architecture to eliminate bottlenecks in a parallel tiled QRD algorithm for future MIMO systems
CN113342310A (zh) 一种应用于格密码的串行参数可配快速数论变换硬件加速器
Liu et al. Hardware efficient architectures for eigenvalue computation
Wang et al. Hardware efficient architectures of improved Jacobi method to solve the eigen problem
Wahid et al. Hybrid architecture and VLSI implementation of the Cosine–Fourier–Haar transforms
Lee et al. Area-Delay Efficient Digit-Serial Multiplier Based on $ k $-Partitioning Scheme Combined With TMVP Block Recombination Approach
CN116719499A (zh) 一种应用于5g最小二乘定位的自适应伪逆计算方法
CN107657078B (zh) 基于fpga的超声相控阵浮点聚焦发射实现方法
Liu et al. Hardware architecture based on parallel tiled QRD algorithm for future MIMO systems
CN212569855U (zh) 一种激活函数的硬件实现装置
CN108833043A (zh) 基于Polar法改进的AWGN信道实现方法及装置
CN113778378A (zh) 一种求解复数n次方根的装置和方法
Jianwen et al. Matrix Inversion on Reconfigurable Hardware using Binary-coded z-path CORDIC

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19924614

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19924614

Country of ref document: EP

Kind code of ref document: A1