CN111984056B - GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method - Google Patents

GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method Download PDF

Info

Publication number
CN111984056B
CN111984056B CN202010662304.XA CN202010662304A CN111984056B CN 111984056 B CN111984056 B CN 111984056B CN 202010662304 A CN202010662304 A CN 202010662304A CN 111984056 B CN111984056 B CN 111984056B
Authority
CN
China
Prior art keywords
lookup table
phase
nco
data
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010662304.XA
Other languages
Chinese (zh)
Other versions
CN111984056A (en
Inventor
马宏
焦义文
陈永强
吴涛
杨文革
刘燕都
张宝玲
张威
蔡洋
曹玉凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Original Assignee
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Liberation Army Strategic Support Force Aerospace Engineering University filed Critical Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority to CN202010662304.XA priority Critical patent/CN111984056B/en
Publication of CN111984056A publication Critical patent/CN111984056A/en
Application granted granted Critical
Publication of CN111984056B publication Critical patent/CN111984056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/02Digital function generators
    • G06F1/03Digital function generators working, at least partly, by table look-up
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03DDEMODULATION OR TRANSFERENCE OF MODULATION FROM ONE CARRIER TO ANOTHER
    • H03D7/00Transference of modulation from one carrier to another, e.g. frequency-changing
    • H03D7/16Multiple-frequency-changing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stabilization Of Oscillater, Synchronisation, Frequency Synthesizers (AREA)

Abstract

The invention discloses a GPU texture cache and accumulated error compensation-based numerically-controlled oscillator and an implementation method thereof, and belongs to the technical field of communication. The invention can design and realize the high-efficiency and high-precision numerically-controlled oscillator NCO by utilizing the high flexibility and the high-efficiency parallel data processing capability of a Graphic Processing Unit (GPU). The scheme of the invention specifically comprises the following steps: and constructing a sine lookup table LUT, and storing the sine lookup table LUT by using a texture memory of a GPU (graphics processing Unit). The GPU receives input data and performs segmentation processing on the data. Initial phase of NCO corresponding to ith section of input data
Figure DDA0002579060200000011
Phase of NCO corresponding to iL data
Figure DDA0002579060200000012
Accumulating to obtain accumulated phase
Figure DDA0002579060200000013
Using accumulated phase
Figure DDA0002579060200000014
And constructing a lookup table index value index (iL), and searching a sine lookup table LUT in the texture cache by using the lookup table index value index (iL) to obtain sine and cosine two-way quadrature numerically-controlled oscillator NCO signals.

Description

GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method
Technical Field
The invention relates to the technical field of communication, in particular to a numerical control oscillator based on GPU texture caching and accumulated error compensation and an implementation method.
Background
A Digital Down Converter (DDC) system is an important subsystem of a modern aerospace measurement and control system. In a typical DDC system, a digital local oscillator is its most complex core device.
In a conventional DDC system, a digital local Oscillator is mainly implemented by an NCO (Numerical Control Oscillator). A typical NCO consists of an N-bit phase accumulator, a phase register, and an M-bit sine look-up table. Increasing the number of bits of N and M can effectively improve the frequency and phase resolution, but the phase resolution precision of the lookup table is limited by the storage space on the FPGA chip and cannot be effectively improved. In recent years, with the increase of on-chip storage space, the lookup table method is widely applied due to the characteristics of less occupied computing resources and high speed, but the problems are not fundamentally solved. The coordic (coding Rotation Digital computer) algorithm proposed by Jack volertic in 1959 provides an alternative solution to this problem. The algorithm approximately calculates the value of the function to be calculated by shifting, adding, subtracting and iterative operation, replaces the operation of a lookup table, effectively saves very limited on-chip storage resources, but consumes more resources for additional iterative operation. So far, the achievement of high-precision phase resolution still needs to make a compromise between computational resources and memory space.
The GPU provides an effective scheme for solving the problems, and the CUDA can provide a high-efficiency high-precision sine function lookup table for realizing the digital local oscillator by utilizing the high-efficiency floating point arithmetic capability and the multi-level storage system. In 2016, a university of Sichuan team designs a digital local oscillator by using a lookup table method, and the speed improvement of the method is realized by 4 times compared with direct calculation, however, the frequency precision of the method is limited by the number of threads in the same block and is difficult to improve. Scott C.Kim [2] and the like respectively use texture memory nearest neighbor and linear interpolation to realize output of any bandwidth, and the results show that the Mean Square Error (MSE) of texture interpolation and traditional resampling is about 4.11e-4, the MSE of nearest neighbor and linear interpolation is about 1e-5, and the MSE of linear interpolation is slightly superior to that of nearest neighbor interpolation, but the method does not solve the problem of phase accumulation Error and has lower precision.
Although the NCO implementation method based on the GPU has the advantages of flexibility and high efficiency, the floating-point-number-based operation method of the GPU accumulates rounding errors caused by the order matching operation of floating-point numbers in the phase accumulation process. Especially when the number of points of phase accumulation is large, the error accumulation becomes significant and needs to be eliminated by pertinently researching a proper algorithm.
How to design and realize a numerically controlled oscillator NCO with high efficiency and high precision by utilizing the high flexibility and the high-efficiency parallel data processing capability of a Graphic Processing Unit (GPU) is a problem to be solved urgently at present.
Disclosure of Invention
In view of this, the present invention provides a numerically controlled oscillator based on GPU texture caching and accumulated error compensation and an implementation method thereof, which can design and implement a high-efficiency and high-precision numerically controlled oscillator NCO by using the high flexibility and high-efficiency parallel data processing capability of a Graphics Processing Unit (GPU).
In order to achieve the purpose, the technical scheme of the invention is as follows: a numerical control oscillator NCO realization method based on GPU texture caching and accumulated error compensation comprises the following steps:
step one, constructing a sine lookup table LUT, and storing the sine lookup table LUT by using a texture memory of a GPU (graphics processing unit).
And step two, the GPU receives input data and carries out segmented processing on the data.
The total number of points of the input data is nLT; when the processed data is the ith segment of input data, the previous segment is the ith segmentThe i-1 section of input data corresponds to the tail phase of NCO
Figure BDA0002579060180000021
The initial phase of the input data NCO for the ith segment.
Step three, for the ith segment of data with the length of nL, the corresponding phase of the data with the index of iL is as follows:
Figure BDA0002579060180000031
wherein f isLOThe local oscillation frequency of the numerically controlled oscillator NCO is; f. ofsIs the sampling rate of the input data; iL is data index with value of 0-nL-1.
Step four, adopting the following mode to carry out initial phase of NCO on the ith section of input data
Figure BDA0002579060180000032
Phase corresponding to iL data
Figure BDA0002579060180000033
Accumulating to obtain accumulated phase
Figure BDA0002579060180000034
To be provided with
Figure BDA0002579060180000035
As a first input value a to
Figure BDA0002579060180000036
As a second input value b; a sum deviation of
Figure BDA0002579060180000037
b is a sum deviation of
Figure BDA0002579060180000038
Figure BDA0002579060180000039
The corrected value of (a) is a' ═ a + da;
Figure BDA00025790601800000310
the corrected value of (b) is b' ═ b + db.
The accumulated phase is:
Figure BDA00025790601800000311
step five, utilizing accumulated phase
Figure BDA00025790601800000312
And constructing a lookup table index value index (iL), and searching a sine lookup table LUT in a texture memory by using the lookup table index value index (iL) to obtain sine and cosine two-path orthogonal NCO signals.
Further, in step five, the accumulated phase is utilized
Figure BDA00025790601800000313
Constructing a lookup table index value (iL), specifically:
Figure BDA00025790601800000314
further, in the fifth step, a lookup table index (il) is used to find out and obtain sine and cosine orthogonal NCO outputs from a sine lookup table LUT in the texture memory, specifically:
and searching the sine lookup table LUT in the texture memory by using the lookup table index value index (iL) to obtain the sine NCO output.
The index value index (il) of the lookup table is shifted by 1/4 cycles, and the cosine NCO output is obtained by looking up from the sine lookup table LUT in the texture memory.
Further, the sinusoidal NCO output is NCOI(iL)
NCOI(iL)=lookup(LUT,index);
Where lookup is a lookup function of a lookup table.
Cosine NCO output being NCOQ(iL)。
NCOQ(iL)=lookup(LUT,index+nLT/4)。
Further, in the first step, only the data of the previous 1/4 cycles in the sine lookup table is stored in the texture memory of the GPU; in step five, before the lookup table index (il) is used to look up from the sine lookup table LUT in the texture memory, the lookup table index (il) is converted to the angle corresponding to the previous 1/4 cycles.
Another embodiment of the present invention further provides a GPU texture caching and accumulated error compensation based numerically controlled oscillator, as shown in fig. 2, comprising a phase calculation module, a phase accumulation module, and a lookup table module built on a GPU of a graphics processor.
The phase calculation module receives input data, and the total number of points of the input data is nLT; the phase calculation module carries out sectional processing on the data, the current processing data is ith section input data, and the previous section, i.e. the ith-1 section input data, corresponds to the final phase of NCO
Figure BDA0002579060180000041
The initial phase of the input data NCO for the ith segment; the phase calculation module is used for calculating the corresponding phase of the ith segment of data with the length of nL and the data with the index of iL as follows:
Figure BDA0002579060180000042
wherein f isLOThe local oscillation frequency of the numerically controlled oscillator NCO is; f. ofsIs the sampling rate of the input data; iL is data index with value of 0-nL-1.
The phase accumulation module is used for carrying out initial phase on the ith section of input data NCO
Figure BDA0002579060180000043
Phase corresponding to iL data
Figure BDA0002579060180000044
Accumulating to obtain accumulated phase
Figure BDA0002579060180000045
The accumulated phase is:
Figure BDA0002579060180000046
wherein at least
Figure BDA0002579060180000047
As a first input value a to
Figure BDA0002579060180000048
As a second input value b; a sum deviation of
Figure BDA0002579060180000049
b is a sum deviation of
Figure BDA00025790601800000410
Figure BDA00025790601800000411
The corrected value of (a) is a' ═ a + da;
Figure BDA00025790601800000412
the corrected value of (b) is b' ═ b + db.
The lookup table module is used for constructing a sine lookup table LUT and storing the sine lookup table LUT by using a texture memory of a GPU (graphics processing unit); look-up table module utilizing accumulated phase
Figure BDA00025790601800000413
And constructing a lookup table index value index (iL), and searching from a sine lookup table LUT in a texture memory by using the lookup table index value index (iL) to obtain sine and cosine two-path orthogonal NCO output.
Furthermore, the phase accumulation module is realized by adopting seven adders; respectively, a first adder to a seventh adder.
The first adder takes a and da as input and carries out addition operation and outputs
Figure BDA0002579060180000051
The corrected value a' of (a).
The second adder takes b and db as input to carry out addition operation and output
Figure BDA0002579060180000052
Corrected value b' of (1).
The third adder takes a 'and b' as input to perform addition operation and output accumulated phase
Figure BDA0002579060180000053
A fourth adder for adding phase
Figure BDA0002579060180000054
And-a' as inputs for addition to output a first intermediate quantity b 1;
Figure BDA0002579060180000055
the fifth adder takes b' and-b 1 as input to carry out addition operation and output a second intermediate quantity a 1;
Figure BDA0002579060180000056
the sixth adder takes a' and-a 1 as input to carry out addition operation to output a summation deviation da;
the seventh adder takes b' and-b 1 as input to perform addition operation to output the summation deviation db of b.
Has the advantages that:
1. the numerical control oscillator based on the GPU texture cache and the accumulated error compensation and the implementation method thereof provided by the embodiment of the invention realize the correct output of the NCO by utilizing a lookup table method based on the GPU. In the process of realizing the NCO by the GPU texture cache lookup table, the high-efficiency and high-precision output of the NCO is realized by fully utilizing the linear interpolation advantage and the cache advantage of the texture cache. And finally, aiming at the common problem of large accumulated errors of floating-point number phase accumulation operation in NCO calculation, an error-free transformation technology is utilized, a floating-point number phase accumulated error comprehensive compensation algorithm based on a GPU is designed, the floating-point number phase accumulated error is controlled to be 1e-5 orders of magnitude by utilizing the algorithm, and the phase calculation precision is effectively improved.
2. The invention provides a lookup table optimization design aiming at a sine lookup table, namely, 1/4 period data is stored in an original data table of the whole period, so that the phase resolution is effectively increased, and the spurious suppression of an output signal is effectively improved.
Drawings
Fig. 1 is a flowchart of a method for implementing a numerically controlled oscillator based on GPU texture caching and accumulated error compensation according to an embodiment of the present invention;
fig. 2 is a block diagram of a numerically controlled oscillator based on GPU texture caching and accumulated error compensation according to an embodiment of the present invention;
FIG. 3 is a block diagram of a phase accumulation module in a numerically controlled oscillator based on GPU texture caching and accumulated error compensation according to an embodiment of the present invention
FIG. 4 is a waveform diagram of a complete cycle phase sine lookup table in an embodiment of the present invention;
FIG. 5 is an exemplary diagram of an output NCO spectrum corresponding to a complete cycle phase sine lookup table in an embodiment of the present invention;
FIG. 6 is a waveform diagram of an optimized lookup table in an embodiment of the present invention;
FIG. 7 is an exemplary diagram of an output NCO spectrum corresponding to an optimized lookup table in an embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating absolute error comparison between data based on the 2Sum algorithm and the float algorithm in the embodiment of the present invention.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention provides a numerical control oscillator NCO realization method based on GPU texture caching and accumulated error compensation, as shown in figure 1, comprising the following steps:
step one, constructing a sine lookup table LUT, and storing the sine lookup table LUT by using a texture memory of a GPU (graphics processing unit).
Wherein the GPU has linear interpolation advantages and cache advantages in the presence of a specific read mode within the texture.
And step two, the GPU receives input data and carries out segmented processing on the data.
Total number of points of input dataIs nLT; when the pre-processing data is the ith section of input data, the previous section, i.e. the i-1 th section of input data corresponds to the tail phase of NCO
Figure BDA0002579060180000061
The initial phase of the input data NCO for the ith segment;
step three, for the ith segment of data with the length of nL, the corresponding phase of the data with the index of iL is as follows:
Figure BDA0002579060180000071
wherein f isLOThe local oscillation frequency of the numerically controlled oscillator NCO is; f. ofsIs the sampling rate of the input data; iL is data index with value of 0-nL-1.
Step four, adopting the following mode to carry out initial phase of NCO on the ith section of input data
Figure BDA0002579060180000072
Phase corresponding to iL data
Figure BDA0002579060180000073
Accumulating to obtain accumulated phase
Figure BDA0002579060180000074
In NCO calculation, phase accumulation is an indispensable core link, while a floating-point number accumulation process based on nearest point rounding inevitably leads to accumulation of phase errors, and as the number of phase accumulation points increases, the accumulation of errors will become larger and larger, and finally, it will possibly lead to complete error of output data. Although the double-precision floating-point number method can relieve accumulated errors, the double-precision operation brings adverse effects to the operation efficiency of the system. Therefore, the invention provides a floating-point number phase accumulated error comprehensive compensation algorithm based on an error-free transformation technology, and the method can effectively reduce the phase accumulated error on the premise of ensuring the operation efficiency.
To be provided with
Figure BDA0002579060180000075
As a first input value a to
Figure BDA0002579060180000076
As a second input value b; a sum deviation of
Figure BDA0002579060180000077
b is a sum deviation of
Figure BDA0002579060180000078
Figure BDA0002579060180000079
The corrected value of (a) is a' ═ a + da;
Figure BDA00025790601800000710
the corrected value of (b) is b' ═ b + db.
The accumulated phase is:
Figure BDA00025790601800000711
the accumulation mode of the accumulated phase provided by the embodiment of the invention can compensate the accumulated operation deviation of each step of floating point number of two numbers participating in accumulation when the two numbers are accumulated next time. Through the above process, the comprehensive compensation of the accumulated phase is completed.
Step five, utilizing accumulated phase
Figure BDA00025790601800000712
And constructing a lookup table index value index (iL), and searching a sine lookup table LUT in a texture memory by using the lookup table index value index (iL) to obtain sine and cosine two-path orthogonal NCO signals.
The index value index (il) of the lookup table constructed in this step is specifically:
Figure BDA00025790601800000713
searching a sine lookup table LUT in a texture memory by using a lookup table index value (iL) to obtain sine NCO output;
the index value index (il) of the lookup table is shifted by 1/4 cycles, and the cosine NCO output is obtained by looking up from the sine lookup table LUT in the texture memory.
Wherein the sinusoidal NCO output is NCOI(iL):NCOI(iL)=lookup(LUT,index);
Where lookup is a lookup function of a lookup table.
Cosine NCO output being NCOQ(iL):NCOQ(iL)=lookup(LUT,index+nLT/4)。
And carrying out orthogonal frequency mixing processing on input data according to the sine and cosine orthogonal NCO signals to obtain the output of the required intermediate frequency signal.
In the above embodiment, the GPU texture cache successfully implements NCO signal output based on the sine lookup table, however, in this embodiment, sine wave data of one period is directly stored, the symmetric characteristic of the sine signal is not fully utilized, and in addition, in the phase value calculation process, as the data index increases, the accumulated phase value will continuously increase, and finally data overflow may occur. Based on the above analysis, in an embodiment of the present invention, in the first step, the following optimization is performed for the form of storing the sine lookup table in the texture memory of the GPU: only the data of the first 1/4 cycles in the sine lookup table, that is, the data of 1/4 cycles is stored in the original data table of the whole cycle. In step five, before the lookup table index (il) is used to look up from the sine lookup table LUT in the texture memory, the lookup table index (il) is converted to the angle corresponding to the previous 1/4 cycles.
In the embodiment of the present invention, a scheme of a complete cycle phase lookup table stored in a texture memory is compared with an NCO spectrum obtained by an optimization method of the complete cycle phase lookup table stored in the texture memory and stored in 1/4 cycle phase lookup tables, where the complete cycle phase lookup table stored in the texture memory is shown in fig. 4, and is a single-precision floating point number sine lookup table in a range of 1024 points, a sampling frequency is set to 64MHz, a local oscillation frequency is set to 9MHz, and an NCO output signal spectrum is obtained as shown in fig. 5. The 1/4-cycle phase lookup table stored in the texture memory is shown in fig. 6, and in order to obtain a 1024-point single-precision floating-point number sine lookup table in a pi/2 range according to the optimization algorithm design, the sampling frequency is set to be 64MHz, the local oscillation frequency is set to be 9MHz, and the frequency spectrum of the obtained NCO output signal is shown in fig. 7. It can be seen that the optimized way of storing 1/4 periodic phase lookup tables in texture memory effectively improves spur rejection of the output signal due to the increased phase resolution.
Another embodiment of the present invention provides a GPU texture caching and accumulated error compensation based digitally controlled oscillator, which is composed as shown in fig. 2, and is characterized by including a phase calculation module, a phase accumulation module, and a lookup table module, which are built on a GPU of a graphics processor.
The phase calculation module receives input data, and the total number of points of the input data is nLT; the phase calculation module carries out sectional processing on the data, the current processing data is ith section input data, and the previous section, i.e. the ith-1 section input data, corresponds to the final phase of NCO
Figure BDA0002579060180000091
The initial phase of the input data NCO for the ith segment; the phase calculation module is used for calculating the corresponding phase of the ith segment of data with the length of nL and the data with the index of iL as follows:
Figure BDA0002579060180000092
wherein f isLOThe local oscillation frequency of the numerically controlled oscillator NCO is; f. ofsIs the sampling rate of the input data; iL is data index with value of 0-nL-1.
The phase accumulation module is used for carrying out initial phase on the ith section of input data NCO
Figure BDA0002579060180000093
Phase corresponding to iL data
Figure BDA0002579060180000094
Accumulating to obtain accumulated phase
Figure BDA0002579060180000095
AccumulationThe phase is:
Figure BDA0002579060180000096
wherein at least
Figure BDA0002579060180000097
As a first input value a to
Figure BDA0002579060180000098
As a second input value b; a sum deviation of
Figure BDA0002579060180000099
b is a sum deviation of
Figure BDA00025790601800000910
Figure BDA00025790601800000911
The corrected value of (a) is a' ═ a + da;
Figure BDA00025790601800000912
the corrected value of (b) is b' ═ b + db.
The lookup table module is used for constructing a sine lookup table LUT and storing the sine lookup table LUT by using a texture memory of a GPU (graphics processing unit); look-up table module utilizing accumulated phase
Figure BDA00025790601800000913
And constructing a lookup table index value index (iL), and searching from a sine lookup table LUT in the texture memory by using the lookup table index value index (iL) to obtain sine and cosine two-way quadrature numerically controlled oscillator NCO output.
And finally, an output module can be added, and the output module is used for carrying out orthogonal processing on input data according to sine and cosine two-path orthogonal NCO output to obtain the output of the numerically controlled oscillator NCO.
In the embodiment of the invention, the phase accumulation module is realized by adopting seven adders; the specific connection relationship is shown in fig. 3, wherein the seven adders are respectively the first to seventh adders.
First addition methodThe device takes a and da as input to carry out addition operation and output
Figure BDA0002579060180000101
The corrected value a' of (a).
The second adder takes b and db as input to carry out addition operation and output
Figure BDA0002579060180000102
Corrected value b' of (1).
The third adder takes a 'and b' as input to perform addition operation and output accumulated phase
Figure BDA0002579060180000103
A fourth adder for adding phase
Figure BDA0002579060180000104
And-a' as inputs for addition to output a first intermediate quantity b 1;
Figure BDA0002579060180000105
the fifth adder takes b' and-b 1 as input to carry out addition operation and output a second intermediate quantity a 1;
Figure BDA0002579060180000106
the sixth adder takes a' and-a 1 as input and performs addition to output a summation offset da.
The seventh adder takes b' and-b 1 as input to perform addition operation to output the summation deviation db of b.
The NCO scheme provided by the invention is utilized to carry out simulation analysis on the output phase, the simulation parameter is NCO frequency 9MHz, the sampling rate Fs is 64MHz, the analysis data length is 2^15, the analysis data is divided into 32 sections, and each section is 1024 points. And respectively adopting single-precision, compensation single-precision and double-precision NCO operation data to analyze and compare NCO phases. The simulation process adopts an accumulation method to calculate the phase, the phase is circularly reset every 1024 points to eliminate the whole-cycle part, the accumulation calculation adopts a 2Sum algorithm to eliminate the floating-point number calculation deviation, and finally the optimized calculation results of the double-precision calculation, the single-precision calculation and the single-precision 2Sum algorithm and the theoretical value deviation are shown in figure 8. As can be seen from the figure, after the 2Sum algorithm is superposed on the basis of single-precision operation, the operation deviation is greatly compressed without phase drift, the phase deviation is better than 1e-5rad, and the precision is greatly improved.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A numerical control oscillator NCO realization method based on GPU texture caching and accumulated error compensation is characterized by comprising the following steps:
step one, constructing a sine lookup table LUT, and storing the sine lookup table LUT by using a texture memory of a Graphics Processing Unit (GPU);
step two, the GPU receives input data and carries out segmented processing on the data;
the total number of points of the input data is nLT; when the pre-processing data is the ith section of input data, the previous section, i.e. the i-1 th section of input data corresponds to the tail phase of NCO
Figure FDA0002952420430000011
The initial phase of the input data NCO for the ith segment;
step three, for the ith segment of data with the length of nL, the corresponding phase of the data with the index of iL is as follows:
Figure FDA0002952420430000012
wherein f isLOThe local oscillation frequency of the numerically controlled oscillator NCO is obtained; f. ofsIs the sampling rate of the input data; iL is a data index and takes the value of 0-nL-1;
step four, adopting the following mode to carry out initial phase of NCO on the ith section of input data
Figure FDA0002952420430000013
A phase corresponding to the iL data
Figure FDA0002952420430000014
Accumulating to obtain accumulated phase
Figure FDA0002952420430000015
To be provided with
Figure FDA0002952420430000016
As a first input value a to
Figure FDA0002952420430000017
As a second input value b; the summation deviation of a is da ═ a '- [ b' - (phi-)out(iL)-a')](ii) a The sum of b is different from db ═ b' - (phi)out(iL)-a');
Figure FDA0002952420430000018
The corrected value of (a) is a' ═ a + da;
Figure FDA0002952420430000019
the corrected value of (a) is b' ═ b + db;
the accumulated phase is:
Figure FDA00029524204300000110
step five, utilizing the accumulated phase
Figure FDA00029524204300000111
And constructing a lookup table index value index (iL), and searching and obtaining sine and cosine orthogonal NCO signals from a sine lookup table LUT in the texture memory by using the lookup table index value index (iL).
2. The method of claim 1 wherein in step five, said accumulated phase is utilized
Figure FDA00029524204300000112
Constructing a lookup table index value (iL), specifically:
Figure FDA0002952420430000021
3. the method as claimed in claim 2, wherein in the fifth step, the lookup table index (il) is used to lookup from the sine lookup table LUT in the texture cache to obtain sine and cosine two-way orthogonal NCO outputs, specifically:
searching from a sine lookup table LUT in the texture cache by using the lookup table index value index (iL) to obtain a sine NCO output;
and shifting the index value index (il) of the lookup table by 1/4 cycles, and searching from a sine lookup table LUT in the texture cache to obtain a cosine NCO output.
4. The method of claim 3 wherein the sinusoidal NCO output is NCOI(iL)
NCOI(iL)=lookup(LUT,index);
Wherein the lookup is a lookup function of a lookup table;
cosine NCO output being NCOQ(iL);
NCOQ(iL)=lookup(LUT,index+nLT/4)。
5. The method according to any one of claims 1 to 4, wherein in the first step, only the data of the previous 1/4 cycles in the sine lookup table is stored in the texture memory of the GPU;
in the fifth step, before the lookup table index (il) is used to lookup from the sine lookup table LUT in the texture cache, the lookup table index (il) is converted to the angle corresponding to the previous 1/4 cycles.
6. The numerical control oscillator based on GPU texture memory and accumulated error compensation is characterized by comprising a phase calculation module, a phase accumulation module and a lookup table module, wherein the phase calculation module, the phase accumulation module and the lookup table module are built on a GPU of a graphic processor;
the phase calculation module receives input data, and the total number of points of the input data is nLT; the phase calculation module carries out sectional processing on the data, the current processing data is ith section input data, and the previous section, i.e. the ith-1 section input data, corresponds to the tail phase of the NCO
Figure FDA0002952420430000022
The initial phase of the input data NCO for the ith segment; the phase calculation module calculates the corresponding phase for the ith segment of data with the length of nL and the data with the index of iL as follows:
Figure FDA0002952420430000031
wherein f isLOThe local oscillation frequency of the numerically controlled oscillator NCO is obtained; f. ofsIs the sampling rate of the input data; iL is a data index and takes the value of 0-nL-1;
the phase accumulation module is used for carrying out initial phase on the ith segment of input data NCO
Figure FDA0002952420430000032
A phase corresponding to the iL data
Figure FDA0002952420430000033
Accumulating to obtain accumulated phase
Figure FDA0002952420430000034
The accumulated phase is:
Figure FDA0002952420430000035
wherein at least
Figure FDA0002952420430000036
As a first input value a to
Figure FDA0002952420430000037
As a second input value b; a sum deviation of
Figure FDA0002952420430000038
b is a sum deviation of
Figure FDA0002952420430000039
Figure FDA00029524204300000310
The corrected value of (a) is a' ═ a + da;
Figure FDA00029524204300000311
the corrected value of (a) is b' ═ b + db;
the lookup table module is used for constructing a sine lookup table LUT and storing the sine lookup table LUT by using a texture memory of a GPU (graphics processing unit); the look-up table module utilizes the accumulated phase
Figure FDA00029524204300000312
And constructing a lookup table index value index (iL), and searching from a sine lookup table LUT in the texture memory by using the lookup table index value index (iL) to obtain sine and cosine quadrature numerically-controlled oscillator NCO outputs.
7. The digitally-controlled oscillator of claim 6, wherein the phase accumulation module is implemented with seven adders; a first adder to a seventh adder;
the first adder takes a and da as input and carries out addition operation and outputs
Figure FDA00029524204300000313
The corrected value a';
second addingThe method takes b and db as input to carry out addition operation and output
Figure FDA00029524204300000314
The corrected value b';
the third adder takes a 'and b' as input to perform addition operation and output accumulated phase
Figure FDA00029524204300000315
A fourth adder for adding phase
Figure FDA00029524204300000316
And-a' as inputs for addition to output a first intermediate quantity b 1; b1 phiout(iL)-a';
The fifth adder takes b' and-b 1 as input to carry out addition operation and output a second intermediate quantity a 1;
a1=b'-(φout(iL)-a')
the sixth adder takes a' and-a 1 as input to carry out addition operation to output a summation deviation da;
the seventh adder takes b' and-b 1 as input to perform addition operation to output the summation deviation db of b.
CN202010662304.XA 2020-07-10 2020-07-10 GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method Active CN111984056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010662304.XA CN111984056B (en) 2020-07-10 2020-07-10 GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010662304.XA CN111984056B (en) 2020-07-10 2020-07-10 GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method

Publications (2)

Publication Number Publication Date
CN111984056A CN111984056A (en) 2020-11-24
CN111984056B true CN111984056B (en) 2021-04-27

Family

ID=73439100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010662304.XA Active CN111984056B (en) 2020-07-10 2020-07-10 GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method

Country Status (1)

Country Link
CN (1) CN111984056B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510268B (en) * 2021-12-24 2022-09-20 中国人民解放军战略支援部队航天工程大学 GPU-based method for realizing single-precision floating point number accumulated error control in down-conversion

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107565958A (en) * 2016-07-01 2018-01-09 英特尔Ip公司 To the gain calibration of digital controlled oscillator in fast lock phase-locked loop

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1835389B (en) * 2005-03-14 2010-07-14 华为技术有限公司 Method able to eliminate frequency error of digital controlled oscillator and phase accumulator
CN101345886B (en) * 2008-09-03 2011-11-02 华为技术有限公司 Method and device for phase error correction
CN101854172B (en) * 2009-04-01 2013-01-09 北京理工大学 Numerical control oscillator parallel design method based on two-dimensional sine table
JP5662040B2 (en) * 2010-03-16 2015-01-28 株式会社メガチップス Numerically controlled oscillator
CN106803818B (en) * 2016-12-08 2020-07-28 华中科技大学 Method and device for receiving TD-AltBOC signal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107565958A (en) * 2016-07-01 2018-01-09 英特尔Ip公司 To the gain calibration of digital controlled oscillator in fast lock phase-locked loop

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于CUDA的GPS软件接收机研究;武新波;《中国优秀硕士学位论文全文数据库(电子期刊)基础科学辑》;20161130;A008-26 *

Also Published As

Publication number Publication date
CN111984056A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN107305484B (en) Nonlinear function operation device and method
CN111984056B (en) GPU (graphics processing Unit) texture cache and accumulated error compensation based numerically-controlled oscillator and implementation method
CN103488245B (en) Phase amplitude conversion method in DDS and device
US8751555B2 (en) Rounding unit for decimal floating-point division
JPH05241794A (en) Device for approximating transcendental function and its method
CN104133656A (en) Floating point number divider adopting shift and subtraction operation by tail codes and floating point number division operation method adopting shift and subtraction operation by tail codes
CN111984057B (en) GPU-based digital NCO high-precision parallel implementation method
CN111813371A (en) Floating-point division operation method, system and readable medium for digital signal processing
CN102566965B (en) Floating-point number logarithmic operation device with flat errors
CN107102841A (en) A kind of coordinate transform parallel calculating method and device
CN107423026A (en) The implementation method and device that a kind of sin cos functionses calculate
CN113126954B (en) Method, device and arithmetic logic unit for floating point number multiplication calculation
CN107015783B (en) Floating point angle compression implementation method and device
CN111831257A (en) Implementation method and device for calculating sine or cosine function
Chekushkin et al. Improving polynomial methods of reconstruction of functional dependences in information-measuring systems
CN105302520A (en) Reciprocal operation solving method and system
CN114510268B (en) GPU-based method for realizing single-precision floating point number accumulated error control in down-conversion
CN107315447A (en) A kind of power Direct Digital Frequency Synthesis and circuit of the conversion of high compression ratio phase amplitude
CN115001485A (en) Direct digital frequency synthesizer based on Taylor polynomial approximation
KR100403374B1 (en) Table Lookup Based Phase Calculator with Normalization of Input Operands for High-Speed Communication
CN103365826A (en) Small-area radical-3 FFT (Fast Fourier Transform) butterfly-shaped unit
CN109687870B (en) Charge redistribution type SARADC capacitance mismatch correction method and system
Maharatna et al. A CORDIC like processor for computation of arctangent and absolute magnitude of a vector
He et al. High‐Performance FP Divider with Sharing Multipliers Based on Goldschmidt Algorithm
CN113721885B (en) Divider based on cordic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant