CN111292222B - Pulsar dispersion eliminating device and method - Google Patents

Pulsar dispersion eliminating device and method Download PDF

Info

Publication number
CN111292222B
CN111292222B CN202010073731.4A CN202010073731A CN111292222B CN 111292222 B CN111292222 B CN 111292222B CN 202010073731 A CN202010073731 A CN 202010073731A CN 111292222 B CN111292222 B CN 111292222B
Authority
CN
China
Prior art keywords
data
dispersion
gpu
pulsar
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010073731.4A
Other languages
Chinese (zh)
Other versions
CN111292222A (en
Inventor
托乎提努尔
王娜
张海龙
王杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Astronomical Observatory of CAS
Original Assignee
Xinjiang Astronomical Observatory of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Astronomical Observatory of CAS filed Critical Xinjiang Astronomical Observatory of CAS
Priority to CN202010073731.4A priority Critical patent/CN111292222B/en
Publication of CN111292222A publication Critical patent/CN111292222A/en
Application granted granted Critical
Publication of CN111292222B publication Critical patent/CN111292222B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Advance Control (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a pulsar decoloring and dispersing method, which comprises a switchable searching mode and a folding mode, and comprises the following steps: adopting an FFT parallel computing module to obtain multi-channel data, initializing related variables according to parameters of the multi-channel data, performing task decomposition, and calculating dispersion quantity translation; distributing GPU memory, and translating and copying the dispersion amount into the GPU memory; and calling a kernel function of the GPU by adopting the CPU, giving a calculation task to the GPU for dispersion elimination, and copying a dispersion elimination processing result to a CPU memory until the dispersion elimination processing is finished. The invention also provides a pulsar dispersion eliminating device. The invention greatly improves the pulsar achromatic powder computing performance, improves the processing speed, meets the real-time chromatic dispersion processing requirement of mass data, reduces the data storage quantity and reduces the system cost.

Description

Pulsar dispersion eliminating device and method
Technical Field
The invention belongs to the field of astronomical observation, and relates to a pulsar dispersion eliminating device and method.
Background
Pulsar is a neutron star which rotates rapidly and generates a magnetic field, has very high density and stable period, and emits electromagnetic waves outwards along the direction of magnetic poles while rotating around its own rotation axis, and when the electromagnetic waves sweep the earth, a radio telescope on the earth receives periodic pulse signals.
When the pulsar signal propagates in the space, the speed is reduced due to the influence of inter-star medium dispersion, and the propagation speed of radio waves with high frequency is faster than that of radio waves with low frequency, so that the time for reaching the radio telescope at high frequency and low frequency is delayed to a certain extent, and the pulsar signal has energy dispersion to deform the pulse profile, so that the pulse is widened, the signal-to-noise ratio is reduced, and even the pulse signal disappears.
Since pulsar signals are extremely weak, it is necessary to perform a process of decoloring the pulsar signals in order to observe clearly visible pulse contours. The pulsar achromatizing technology can effectively improve the sensitivity of astronomical observation and improve the pulsar identification and detection capability of an observation system. In recent years, pulsar scientific research and observation put higher requirements on chromatic dispersion cancellation technology, and a chromatic dispersion system with ultra-bandwidth and high-speed signal processing capability is a necessary trend of development of future radio pulsar observation equipment, and related technologies meet great challenges. The prior achromatizing treatment technology has the following defects:
(1) The improvement of the performance of the observation equipment rapidly expands the frequency range of the astronomical observable astronomical signal of radio, and along with the continuous increase of the observation bandwidth, the resolution is higher and higher, so that the generated data volume is huge, and the existing achromatic technology cannot rapidly process mass data in real time. For example, the amount of data generated by leading-edge observers such as ultra wideband receivers, multi-beam receivers, and PAF receivers is very large, typically of the TB order, and the real-time processing of such large data presents unprecedented challenges to dispersion techniques and dispersion processing algorithms.
(2) When the pulsar dispersion quantity (Dispersion Measure, DM) search space is large, the existing achromatic dispersion processing method has relatively low operation efficiency, huge calculation quantity and longer time consumption, can not meet the high-speed real-time pulsar search requirement, and rapidly increases the system power consumption, complexity and cost.
Therefore, it is necessary to develop a pulsar dispersion eliminating device and method with higher resolution, wider processing bandwidth and stronger system stability, which are important for pulsar searching, observation and scientific research.
Disclosure of Invention
The invention aims to provide a pulsar dispersion eliminating device and a pulsar dispersion eliminating method in high speed and real time so as to greatly improve the computing performance of pulsar dispersion eliminating.
In order to achieve the above object, the present invention provides a pulsar decoloring method, including a switchable search mode and a folding mode, comprising:
s1: providing a data processing module, wherein the data processing module comprises an FFT parallel computing module and a achromatic processing module, and the FFT parallel computing module is adopted to divide the baseband data of the pulsar into a plurality of mutually independent narrow channel signals so as to obtain multi-channel data;
s2: the data exchange between the CPU and the GPU, the multithreading task allocation and the optimization of GPU parallel computing resources are completed by adopting the achromatic processing module, and the method comprises the following steps:
s21: initializing related variables according to parameters of the multi-channel data, wherein the related variables comprise the number of channels, a DM value, a signal frequency range and the bandwidth of each channel;
s22: in order to improve the parallel computing speed of the chromatic dispersion, task decomposition is carried out, a CPU is controlled to calculate the chromatic dispersion translation, and the chromatic dispersion translation is stored in a CPU memory;
the formula expression of the dispersion amount translation is as follows:
shift=4.15×10 3 ×(f 1 -2 -f 2 -2 )
wherein shift is dispersion amount translation, f 1 ,f 2 Frequencies of channels of the multi-channel data, which are low frequency and high frequency, respectively;
step S23: distributing GPU memory;
step S24: copying the dispersion amount translation into the GPU memory allocated in the step S23;
step S25: the CPU is used for calling the kernel function of the GPU, and the calculation task is handed to the GPU for dispersion elimination, which comprises the following steps: enabling multithreading on the GPU to calculate delay time of each channel of the multichannel data, performing accumulation operation on each dispersion value to obtain a dispersion eliminating processing result, and writing the dispersion eliminating processing result into a global memory of the GPU;
the delay time of the frequency channel is as follows:
Figure BDA0002377939210000021
wherein t is sam For sampling period of signal, t DM The delay time of the frequency channel is given by DM, the dispersion quantity is given by shift, and the dispersion quantity is shifted;
step S26: copying the result of the decoloring processing to the CPU memory until the decoloring processing of all the baseband data is finished, and releasing the GPU memory allocated in step S23.
In the search mode, the step S1 further includes: writing the multi-channel data to a disk to temporarily store the multi-channel data; the step S21 further includes: before initialization, reading the multichannel data from a disk and storing the multichannel data into a CPU memory; and the step S24 further includes: copying the multichannel data into the GPU memory allocated in the step S23; in the folded mode, the step S1 further includes: and maintaining the multi-channel data in the video memory of the GPU.
In the step S24, all threads in one thread block acquire the dispersion amount shift of each channel of the multi-channel data, and store the dispersion amount shift in the shared memory in the GPU memory.
In the step S1, the parallel computation of the FFT parallel computation module is implemented by adopting a CUFFT function; in said step S24 is implemented by using memory management functions of the C and/or CUDA API.
The pulsar decoloring method further comprises the step S3 of: writing the achromatic processing result in the CPU memory into a file, wherein the file is stored in a magnetic disk and comprises single-channel time sequence data, file header information and a data part, and the file header information comprises the number of channels, signal bandwidth, sampling rate and dispersion quantity.
In another aspect, the present invention provides a pulsar dispersion apparatus, coupled to a receiver, comprising: the signal digitizing module is positioned on the programmable logic device, and is used for digitizing the pulsar analog signals from the receiver, converting the pulsar analog signals into digital signals, and then generating and transmitting data packets; the data receiving module is positioned on the CPU and comprises an annular buffer area in the CPU memory, and is arranged to receive the data packet sent by the signal digitizing module, unpack the data packet and write the unpacked baseband data into the annular buffer area; the data processing module is located on a GPU platform, and comprises two switchable folding modes and a searching mode, and is configured to read the baseband data in the ring buffer and execute the pulsar erasing method according to the above description on the baseband data.
The signal digitizing module is a ROACH2 hardware platform of CASPER of Berkeley of California university in the United states, is provided with 8 tera Ethernet interfaces, generates UDP data packets, realizes high-speed transmission of baseband data, and is realized by a graphical programming mode.
The baseband data comprises a plurality of data elements and the ring buffer is arranged such that when one data element of the baseband data is processed, the remaining data elements do not need to be moved from their storage locations.
The pulsar decoloring device disclosed by the invention utilizes a programmable logic device to sample signals, package and send UDP data packets, and adopts a GPU with parallel processing capability to carry out FFT and decoloring, so that the advantages of an FPGA and a GPU computing platform are fully exerted; meanwhile, the shared memory annular buffer area is used in the CPU, so that high-speed data transmission and processing of the heterogeneous platform are realized, the multi-task allocation of the CPU, the GPU and the FPGA platform is optimized, the flexibility and the expandability of a dispersion elimination algorithm are improved, and the decoloring device and the decoloring method provided by the invention have higher development efficiency and data processing capability and lower development cost. The pulsar decoloring device adopts the high-performance GPU as a core platform for data processing, and improves the utilization rate of GPU resources by efficiently utilizing the global memory and the shared memory of the GPU, thereby reducing the calculation time; the invention realizes the multi-task parallel processing, greatly improves the pulsar achromatic computing performance, improves the processing speed, meets the real-time chromatic dispersion processing requirement of mass data, reduces the data storage quantity and reduces the system cost.
Drawings
Fig. 1 is a data processing block diagram of a pulsar dispersion device according to an embodiment of the present invention.
FIG. 2 is a CUDA flow chart of a search mode of a pulsar achromatizing method according to an embodiment of the present invention.
Fig. 3 is a system overview of a pulsar dispersion device according to one embodiment of the invention.
Fig. 4 is a schematic diagram showing an acceleration ratio of the pulsar erasing method according to an embodiment of the present invention to the conventional CPU erasing method.
Detailed Description
The invention will be further illustrated with reference to specific examples. It should be understood that the following examples are illustrative of the present invention and are not intended to limit the scope of the present invention.
Referring to fig. 1, a CUDA flowchart of a pulsar achromatizing method according to an embodiment of the present invention includes a switchable search mode and a folding mode, including the steps of:
step S1: providing a data processing module 3, wherein the data processing module 3 comprises an FFT parallel computing module 31 and an achromatizing processing module 32 which are positioned on a GPU, and dividing the baseband data of the pulsar into a plurality of mutually independent narrow channel signals by adopting the FFT parallel computing module 31 to obtain multi-channel data required by achromatizing; wherein the parallel computation of the FFT parallel computation block 31 is implemented by using a CUFFT function.
The FFT parallel computing module 31 copies the baseband data from the ring buffer 23 of the CPU to the global memory of the GPU, and then performs high-speed data processing by using the multithreaded parallel computing resources of the GPU, so as to implement fast fourier transform, thereby generating multi-channel data.
In addition, in the search mode, the step S1 further includes: writing the multi-channel data to a disk to temporarily store the multi-channel data;
in the folded mode, the step S1 further includes: and maintaining the multi-channel data in the video memory of the GPU.
Step S2: as shown in fig. 1, the achromatic processing module 32 is adopted to complete data exchange between the CPU and the GPU, multi-thread task allocation and optimization of GPU parallel computing resources, and specifically comprises the following steps:
step S21: initializing. And initializing related variables such as channel number, DM value, signal frequency range, bandwidth of each channel and the like according to the parameters of the multi-channel data.
In addition, in the search mode, the step S21 further includes: before initialization, the multichannel data is read from a disk and stored in a CPU memory. Thus, in the folding mode, only a simple initialization operation is performed, whereas in the seek mode, multichannel data in the disk is read first and then initialized.
Step S22: the dispersion shift is calculated. In order to increase the speed of the parallel computing of the chromatic dispersion, task decomposition is carried out, a CPU is controlled to calculate the chromatic dispersion amount translation, and the chromatic dispersion amount translation is stored in a CPU memory.
The task decomposition method specifically comprises the following steps of: the conventional achromatic formula of the achromatic method is decomposed into two parts, so that the parallel parts of the achromatic formula are processed by the GPU, and high parallelization is performed.
The dispersion amount shift has the formula:
shift=4.15×10 3 ×(f 1 -2 -f 2 -2 )
wherein shift is dispersion amount translation, f 1 ,f 2 Frequencies of channels of the multi-channel data, which are low frequency and high frequency, respectively;
thus, the dispersion amount shift is calculated in series in the CPU, and the result of the calculation is transmitted to the GPU, and then all the remaining calculation tasks are given to the multithreading process of the GPU.
Step S23: and distributing GPU memory. Since the data processing of the CPU and the GPU are relatively independent, the memory space used by the GPU is prepared in advance.
Step S24: in the search mode, the multichannel data and the dispersion amount are horizontally copied into the GPU memory allocated in the step S23; if folding mode is used, the multi-channel data is already in the GPU, so only dispersion amount panning copies the dispersion amount panning into the GPU memory allocated in step S23.
The CPU and the GPU are provided with independent memory spaces, and cannot directly access parameters and variables of the other party. For the GPU to process data, the dispersion shift is first copied from the CPU into the GPU memory allocated in step S23. In the step S24, the memory management function of the C and/or CUDA API is used to translate and copy the multi-channel data and the dispersion into the GPU memory allocated in the step S23, so as to realize data transmission between the GPU video memory and the CPU memory.
In the step S24, all threads in a thread block acquire the dispersion shift of each channel of the multi-channel data, and store the dispersion shift into the shared memory in the GPU memory, so as to implement the replication of the dispersion shift.
Step S25: the CPU is used for calling the kernel function of the GPU, and the calculation task is handed to the GPU for dispersion elimination, which comprises the following steps: enabling multithreading on the GPU to calculate delay time of each channel of the multichannel data, performing accumulation operation on each dispersion value to obtain a dispersion eliminating processing result, and writing the dispersion eliminating processing result into a global memory of the GPU;
the delay time of the frequency channel is as follows:
Figure BDA0002377939210000061
wherein t is sam For sampling period of signal, t DM The delay time of the frequency channel is given by DM, the dispersion quantity is given by shift, and the dispersion quantity is shifted;
the dispersion amount translation shift adopted in the decoloring processing uses a shared memory, so that the access delay of the global memory is hidden, and the shared memory is a storage system in a GPU (graphics processing unit) chip, so that the shared memory has larger bandwidth and lower access delay compared with the local memory or the global memory of the GPU, and the computing performance of the GPU is improved.
Step S26: copying the decoloring processing result to a CPU memory. And copying the decoloring processing result to a CPU memory by calling a cudaMemcpy () function in the CPU until the decoloring processing of all the baseband data is finished, and releasing the GPU memory allocated in the step S23.
Step S3: writing the achromatizing processing result in the CPU memory into a file to realize storage. The file is stored on a disk and comprises single channel time series data, file header information and data parts, and a clear pulsar signal profile can be seen after folding. The file header information includes the number of channels, signal bandwidth, sampling rate, and dispersion amount, etc.
Because of the huge calculation amount of the pulsar search mode, the GPU acceleration method provided by the invention has more outstanding calculation performance. The CUDA program flow chart of the search mode is shown in FIG. 1, and comprises the processes of data reading and initialization, dispersion translation calculation, GPU memory allocation, data copying, GPU kernel function calling, processing result copying, file writing and the like. The dispersion processing calculation complexity is DM number (dispersion amount number) x frequency channel number x sampling number. Therefore, when parallel achromatic processing tasks are processed in the GPU, the GPU does not need large buffering and complicated flow control operations, and the achromatic processing module 32 performs parallel processing in the dispersion amount and the sampling time dimension on all threads of the GPU through the same CUDA code, so that higher calculation efficiency is obtained. The dispersion elimination processing module 322 respectively accumulates the dispersion amounts of the multiple frequency channels corresponding to the same sampling time after compensation, namely, each thread of the GPU realizes the addition calculation of the multiple frequency channels.
According to the invention, the calculation tasks of the FFT algorithm with large calculation amount and the accumulation part of the frequency channel are mapped to the multithread of the GPU for parallel processing, so that the dependence of the computer system on the CPU performance is reduced, and the data processing performance of the whole system is greatly improved.
The results of processing multichannel data with a center frequency of 408MHz, a sample of 131072 (single channel sample), and a bandwidth of 20MHz loaded into the GPU are shown in tables 1 and 2.
TABLE 1CPU and GPU achromatizing time (unit: s)
Figure BDA0002377939210000071
Figure BDA0002377939210000081
TABLE 2CPU and GPU achromatizing time (unit: s)
Figure BDA0002377939210000082
As can be seen from tables 1 and 2, when the number of channels is fixed, the data processing time of the GPU and the CPU increases linearly with the increase of the DM number, and the decoloring processing time of the GPU is far less than the calculation time of the CPU; when the DM number is fixed, the more time the CPU and GPU need to process as the number of frequency channels of the multi-channel data increases, but the computation time of the CPU is many times longer than that of the GPU. Acceleration of the GPU decoloring method as shown in fig. 2, the acceleration ratio of the parallel algorithm increases as the DM number increases. When the DM number is 2560, the TITAN V GPU has the highest acceleration ratio (i.e., 538 times the CPU speed), and then the acceleration ratio begins to drop. The larger the number of channels, the larger the acceleration ratio of the GPU decoloring processing, and the better the acceleration performance obtained.
In short, the time of the decoloring processing of the GPU is less than the calculation time of the CPU, the calculation speed is hundreds of times different, and the execution time of the decoloring processing is greatly shortened by the GPU. The achromatism processing method effectively solves the problem that real-time processing cannot be performed on a CPU platform due to huge achromatism calculation amount.
Fig. 3 shows a pulsar erasing apparatus 100 according to an embodiment of the present invention, which is suitable for dispersion processing of pulsar signals and related scientific research. Pulsar erasing apparatus 100 is connected to a receiver 200 of a radio telescope, and is mainly composed of three modules: the system comprises a signal digitizing module 1, a data receiving module 2 and a data processing module 3.
As shown in fig. 3-4, the signal digitizing module 1 is a signal processing platform located on a programmable logic device (Field Programmable Gate Array, FPGA), and includes a sampling module 11, a data packet generating module 12, and a teraethernet interface 13, where the sampling module 11 is configured to digitize a pulsar analog signal from the receiver 200, convert the pulsar analog signal into a digital signal, the data packet generating module 12 is configured to generate a data packet of baseband data, and the teraethernet (10 GbE) interface 13 is configured to transmit the data packet to the data receiving module 2. The signal digitizing module 1 processes the data, except for digitizing and generating the data packets, without any processing, so that the transmitted data are pulsar original data, i.e. baseband data.
In this embodiment, the signal digitizing module 1 is a ROACH2 hardware platform of Berkeley CASPER of university of California, and its core processor is an Xilinx Virtex X series FPGA chip, and a Z-DOK connection mode is adopted between the core processor and an A/D sampling board, so as to provide a sampling board with dual channels, 8 bits and highest sampling rate of 5 GHz. The ROACH2 hardware platform is also provided with 8 tera Ethernet (10 GbE) interfaces 13, and the 10GbE network card module is specifically adopted to be connected with a tera Ethernet switch, so that high-speed real-time transmission of data can be realized. The development software environment of the signal digitizing module 1 comprises a signal processing development library provided by the Xilinx/System Generator, the Matlab/Simulink and the CASPER, so that the development difficulty of the FPGA is reduced and the design of the FPGA is efficiently realized in a graphical programming mode. The packet generated by the packet generation module 12 is preferably a UDP packet.
The data receiving module 2 is positioned on the CPU and comprises a tera Ethernet (10 GbE) network card 21, a data packet receiving module 22 and a high-speed annular buffer zone 23 designed in the CPU memory, wherein the tera Ethernet (10 GbE) network card 21 is arranged to receive the data packet sent by the signal digitizing module, and unpacking processing is carried out to obtain unpacked baseband data; the packet receiving module 22 is arranged to write baseband data into a ring buffer 23 designed in the CPU memory. Thereby, the data receiving module 2 is arranged to receive the data packets sent by the signal digitizing module, to perform a de-packetization process and to write de-packetized baseband data into said ring buffer 23. The data in the data packet received by the ethernet network card 21 is baseband data, and is not processed, so the data size is large, and the data cannot be directly sent to the GPU for processing, so a section of ring buffer 23 needs to be opened up in the CPU memory as temporary storage of the data. In the process of continuously buffering data, the next process copies the data directly from the ring buffer 23 to the video memory of the GPU described below in first-out order.
The baseband data includes a plurality of data elements, and the ring buffer 23 is configured to store the baseband data and implement first-in first-out of the baseband data, so as to greatly improve data access and processing speed while avoiding data movement. The size of the ring buffer 23 may be set according to the requirement, and there is no fixed size, such as 16MB-128 MB.
The key of the buffer technology is to design a set of first-in first-out buffer management algorithm, and the annular buffer 23 is a data structure used for representing a buffer with fixed size and connected end to end, and is suitable for buffering data streams. The ring buffer 23 is arranged such that after one data element of the baseband data is processed, the remaining data elements do not need to be moved in their storage locations. Therefore, in the process that the data elements are continuously cached, the baseband data are directly copied into the GPU video memory for processing in the CPU memory according to the first-in first-out sequence, so that the data packet loss is reduced, and the problem of high-speed data transmission among heterogeneous platforms is solved. Conversely, a non-circular buffer requires that after one data element is consumed, the remaining data elements be moved forward. The invention effectively improves the data exchange on the heterogeneous platform by realizing the annular buffer zone.
The data processing module 3 is located on the GPU so that the multithreading by the GPU is exclusively responsible for the parallel computing tasks. The data processing module 3 comprises an FFT parallel computing module 31 and a dispersion processing module 32, arranged to first read the baseband data in the ring buffer 23 and perform the pulsar dispersion method according to the above description on the baseband data, thereby implementing the FFT multi-channel filtering and dispersion processing in parallel on a high-performance GPU platform. As described above, the signal processing flow includes a series of processes of buffer data acquisition, FFT, initialization, dispersion translation calculation, video memory allocation, data transmission, GPU kernel function invoking, processing result writing, and the like, so as to complete data exchange between the CPU and the GPU, multithreading task allocation, and optimization of GPU parallel computing resources.
The FFT parallel computing module 31 employs a cuFFT acceleration bank in the CUDA parallel computing architecture. The FFT operation speed is greatly improved, flexible layout of data can be realized through cuFFT, and 1D FFT conversion can be processed efficiently. The GPU cannot directly communicate with the CPU memory, and data interaction is performed between the memory and the video memory through the PCI-E bus.
In this embodiment, the CPU and GPU respectively use Intel Xeon E5-1620 CPU and NVIDIA new generation GForce series TITAN V GPU, and the software environment is designed by adopting the latest CUDA and Linux systems. In order to improve the data transmission and calculation performance of the GPU, the Stream mode provided by the CUDA is adopted for design, and the bidirectional data transmission of the CPU memory and the GPU video memory is divided, so that the GPU can simultaneously perform calculation work while the CPU memory and the GPU video memory interact data.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and various modifications can be made to the above-described embodiment of the present invention. All simple, equivalent changes and modifications made in accordance with the claims and the specification of the present application fall within the scope of the patent claims. The present invention is not described in detail in the conventional art.

Claims (8)

1. A pulsar erasing method comprising a switchable search mode and a folding mode, comprising:
step S1: providing a data processing module (3), wherein the data processing module (3) comprises an FFT parallel computing module (31) and an achromatic processing module (32), and the FFT parallel computing module is adopted to divide the baseband data of the pulsar into a plurality of mutually independent narrow channel signals so as to obtain multi-channel data;
step S2: the data exchange between the CPU and the GPU, the multithreading task allocation and the optimization of GPU parallel computing resources are completed by adopting the achromatic processing module, and the method comprises the following steps:
step S21: initializing related variables according to parameters of the multi-channel data, wherein the related variables comprise the number of channels, a DM value, a signal frequency range and the bandwidth of each channel;
step S22: in order to improve the parallel computing speed of the chromatic dispersion, task decomposition is carried out, a CPU is controlled to calculate the chromatic dispersion translation, and the chromatic dispersion translation is stored in a CPU memory;
the formula expression of the dispersion amount translation is as follows:
shift=4.15×10 3 ×(f 1 -2 -f 2 -2 )
where shift is the dispersion shift,f 1f 2 frequencies of channels of the multi-channel data, which are low frequency and high frequency, respectively;
step S23: distributing GPU memory;
step S24: copying the dispersion amount translation into the GPU memory allocated in the step S23;
step S25: the CPU is used for calling the kernel function of the GPU, and the calculation task is handed to the GPU for dispersion elimination, which comprises the following steps: enabling multithreading on the GPU to calculate delay time of each channel of the multichannel data, performing accumulation operation on each dispersion value to obtain a dispersion eliminating processing result, and writing the dispersion eliminating processing result into a global memory of the GPU;
the delay time of the frequency channel is as follows:
t DM = shift×DM/t samp
wherein t is samp For sampling period of signal, t DM The delay time of the frequency channel is given by DM, the dispersion quantity is given by shift, and the dispersion quantity is shifted;
step S26: copying the result of the decoloring processing to the CPU memory until the decoloring processing of all the baseband data is finished, and releasing the GPU memory allocated in step S23.
2. The pulser erasing method according to claim 1, wherein in the search mode, the step S1 further includes: writing the multi-channel data to a disk to temporarily store the multi-channel data; the step S21 further includes: before initialization, reading the multichannel data from a disk and storing the multichannel data into a CPU memory; and the step S24 further includes: copying the multichannel data into the GPU memory allocated in the step S23;
in the folded mode, the step S1 further includes: and maintaining the multi-channel data in the video memory of the GPU.
3. The pulsar erasing method according to claim 1, wherein in the step S24, all threads in one thread block acquire dispersion amount shift of each channel of the multi-channel data, respectively, and store the shift in the shared memory in the GPU memory.
4. The pulsar achromatizing method according to claim 1, characterized in that in said step S1, the parallel computation of said FFT parallel computation module is implemented by employing a CUFFT function; in said step S24 is implemented by using memory management functions of the C and/or CUDA API.
5. The pulsar erasing method according to claim 1, further comprising step S3: writing the achromatic processing result in the CPU memory into a file, wherein the file is stored in a magnetic disk and comprises single-channel time sequence data, file header information and a data part, and the file header information comprises the number of channels, signal bandwidth, sampling rate and dispersion quantity.
6. A pulsar dispersion device, coupled to a receiver (200), comprising:
the signal digitizing module (1) is positioned on the programmable logic device, and is used for digitizing the pulsar analog signals from the receiver (200), converting the pulsar analog signals into digital signals, and then generating and transmitting data packets;
the data receiving module (2) is positioned on the CPU and comprises an annular buffer area (23) in the CPU memory, and is arranged to receive the data packet sent by the signal digitizing module, carry out unpacking processing and write unpacked baseband data into the annular buffer area (23);
data processing module (3) on a GPU platform comprising two switchable folding modes and a search mode, arranged to read baseband data in said ring buffer (23) and to perform a pulsar-achromatic method according to one of claims 1 to 5 on said baseband data.
7. Pulsar dispersion eliminating device according to claim 6, characterized in that the signal digitizing module (1) is a ROACH2 hardware platform of berkeley CASPER of university of california, which is equipped with 8 tera ethernet interfaces (13), generating UDP data packets and realizing high-speed transmission of baseband data, the signal digitizing module (1) is a signal processing platform located on an FPGA, and the development of the FPGA is realized by means of a graphical programming.
8. The pulsar-dispersion device of claim 6, wherein the baseband data comprises a plurality of data elements, the ring buffer being configured such that when one data element of the baseband data is processed, the remaining data elements do not need to be moved from their storage locations.
CN202010073731.4A 2020-01-22 2020-01-22 Pulsar dispersion eliminating device and method Active CN111292222B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010073731.4A CN111292222B (en) 2020-01-22 2020-01-22 Pulsar dispersion eliminating device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010073731.4A CN111292222B (en) 2020-01-22 2020-01-22 Pulsar dispersion eliminating device and method

Publications (2)

Publication Number Publication Date
CN111292222A CN111292222A (en) 2020-06-16
CN111292222B true CN111292222B (en) 2023-05-12

Family

ID=71022402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010073731.4A Active CN111292222B (en) 2020-01-22 2020-01-22 Pulsar dispersion eliminating device and method

Country Status (1)

Country Link
CN (1) CN111292222B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114002933B (en) * 2021-11-01 2022-10-21 中国科学院国家授时中心 Method for measuring atomic clock frequency drift based on pulsar search technology
CN117196929B (en) * 2023-09-25 2024-03-08 沐曦集成电路(上海)有限公司 Software and hardware interaction system based on fixed-length data packet

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1493258A (en) * 2002-10-28 2004-05-05 ��ʽ���綫֥ Image processing apparatus and ultrasonic wave diagnosis apparatus
WO2008146457A1 (en) * 2007-05-24 2008-12-04 Kabushiki Kaisha Topcon Eye fundus observation device and its control program
CN101832776A (en) * 2010-05-17 2010-09-15 西安电子科技大学 Pulsar signal de-dispersion instrument based on FPGA (Filed Programmable Gate Array)
EP2348363A1 (en) * 2010-01-25 2011-07-27 Ricoh Company, Ltd. Photocurable liquid developer, method for producing the same, developing device and image forming apparatus
WO2013060076A1 (en) * 2011-10-25 2013-05-02 中国科学院西安光学精密机械研究所 Gate-controlled x-ray source, and space x-ray communication system and method
CN103356161A (en) * 2012-03-30 2013-10-23 佳能株式会社 Optical coherence tomography imaging apparatus and method for controlling the same
CN104434216A (en) * 2013-09-24 2015-03-25 美国西门子医疗解决公司 Shear wave Estimation from Analytic Data
CN104706322A (en) * 2015-03-12 2015-06-17 清华大学 Sweep frequency optical coherent imaging system based on optical calculation
CN105979252A (en) * 2015-12-03 2016-09-28 乐视致新电子科技(天津)有限公司 Test method and device
CN106289239A (en) * 2016-08-15 2017-01-04 中国科学院新疆天文台 A kind of method eliminating the interference of wideband time domain in the pulsar data time of advent
CN106403930A (en) * 2016-08-26 2017-02-15 中国科学院新疆天文台 Pulsar observation device, system and method
CN106771653A (en) * 2016-11-25 2017-05-31 中国科学院新疆天文台 A kind of sudden and violent real-time detection devices, systems and methods of Rapid Radio
JP2017104476A (en) * 2015-03-09 2017-06-15 炭 親良 Beam forming method, measurement imaging device, and communication apparatus
CN108763299A (en) * 2018-04-19 2018-11-06 贵州师范大学 A kind of large-scale data processing calculating acceleration system
CN108900327A (en) * 2018-06-20 2018-11-27 昆明理工大学 A kind of chronometer data acquisition and real-time processing method based on DPDK
CN109584256A (en) * 2018-11-28 2019-04-05 北京师范大学 A kind of pulsar DM algorithm for estimating based on Hough straight-line detection
CN109991479A (en) * 2019-03-22 2019-07-09 中国科学院新疆天文台 The sudden and violent real-time detection apparatus, system and method for the Rapid Radio of multibeam receiver
EP3536222A2 (en) * 2018-03-06 2019-09-11 Nidek Co., Ltd. Oct apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10345466B2 (en) * 2017-07-25 2019-07-09 Advanced Geophysical Technology Inc. Memory efficient Q-RTM computer method and apparatus for imaging seismic data
US11537086B2 (en) * 2018-04-27 2022-12-27 University Of Tennessee Research Foundation Pulsar based timing synchronization method and system

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1493258A (en) * 2002-10-28 2004-05-05 ��ʽ���綫֥ Image processing apparatus and ultrasonic wave diagnosis apparatus
WO2008146457A1 (en) * 2007-05-24 2008-12-04 Kabushiki Kaisha Topcon Eye fundus observation device and its control program
EP2348363A1 (en) * 2010-01-25 2011-07-27 Ricoh Company, Ltd. Photocurable liquid developer, method for producing the same, developing device and image forming apparatus
CN101832776A (en) * 2010-05-17 2010-09-15 西安电子科技大学 Pulsar signal de-dispersion instrument based on FPGA (Filed Programmable Gate Array)
WO2013060076A1 (en) * 2011-10-25 2013-05-02 中国科学院西安光学精密机械研究所 Gate-controlled x-ray source, and space x-ray communication system and method
CN103356161A (en) * 2012-03-30 2013-10-23 佳能株式会社 Optical coherence tomography imaging apparatus and method for controlling the same
CN104434216A (en) * 2013-09-24 2015-03-25 美国西门子医疗解决公司 Shear wave Estimation from Analytic Data
JP2017104476A (en) * 2015-03-09 2017-06-15 炭 親良 Beam forming method, measurement imaging device, and communication apparatus
CN104706322A (en) * 2015-03-12 2015-06-17 清华大学 Sweep frequency optical coherent imaging system based on optical calculation
CN105979252A (en) * 2015-12-03 2016-09-28 乐视致新电子科技(天津)有限公司 Test method and device
CN106289239A (en) * 2016-08-15 2017-01-04 中国科学院新疆天文台 A kind of method eliminating the interference of wideband time domain in the pulsar data time of advent
CN106403930A (en) * 2016-08-26 2017-02-15 中国科学院新疆天文台 Pulsar observation device, system and method
CN106771653A (en) * 2016-11-25 2017-05-31 中国科学院新疆天文台 A kind of sudden and violent real-time detection devices, systems and methods of Rapid Radio
EP3536222A2 (en) * 2018-03-06 2019-09-11 Nidek Co., Ltd. Oct apparatus
CN108763299A (en) * 2018-04-19 2018-11-06 贵州师范大学 A kind of large-scale data processing calculating acceleration system
CN108900327A (en) * 2018-06-20 2018-11-27 昆明理工大学 A kind of chronometer data acquisition and real-time processing method based on DPDK
CN109584256A (en) * 2018-11-28 2019-04-05 北京师范大学 A kind of pulsar DM algorithm for estimating based on Hough straight-line detection
CN109991479A (en) * 2019-03-22 2019-07-09 中国科学院新疆天文台 The sudden and violent real-time detection apparatus, system and method for the Rapid Radio of multibeam receiver

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于ROACH2的脉冲星终端研制进展;裴鑫等;《天文学进展》;20170515(第02期);全文 *

Also Published As

Publication number Publication date
CN111292222A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN107315168B (en) Software radar signal data processing system and method
van Straten et al. DSPSR: digital signal processing software for pulsar astronomy
CN111292222B (en) Pulsar dispersion eliminating device and method
JP2017508582A (en) Software-based ultrasound imaging system
CN110455282A (en) A kind of digital termination system applied to observations of pulsar
CN112306924A (en) Data interaction method, device and system and readable storage medium
CN111368252A (en) Pulsar coherent de-dispersion system and method
US20240187015A1 (en) Methods and apparatus to write data to registers
CN109683018B (en) Parallel processing method for real-time multi-frame frequency domain data
Rupniewski et al. A real-time embedded heterogeneous GPU/FPGA parallel system for radar signal processing
Krawczyk et al. Tryton supercomputer capabilities for analysis of massive data streams
CN111272169A (en) Pulsar signal interference elimination device, system and method
CN112986997B (en) Unmanned airborne SAR real-time imaging processing method and device and electronic equipment
Selvaraj et al. Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
US11029914B2 (en) Multi-core audio processor with phase coherency
CN113869494A (en) Neural network convolution FPGA embedded hardware accelerator based on high-level synthesis
Milovanović et al. A customizable DDR3 SDRAM controller tailored for FPGA-based data buffering inside real-time range-doppler radar signal processing back ends
Pei et al. QTT Ultra-wideband Signal Acquisition and Baseband Data Recording System Design Based on the RFSoC Platform
RU2402807C1 (en) Digital signal processing device
CN110688083A (en) DDR 3-based high-speed data stream long-delay frequency storage forwarding method
Yang et al. The distributed imaging processing method of space-borne SAR based on embedded GPU
Ford et al. An application of high-performance reconfigurable computing in radio astronomy signal processing
Quan et al. The design and implementation of a multi-waveform radar echo simulator
Valette et al. Software quality metrics in space systems
Romein et al. Processing LOFAR telescope data in real time on a Blue Gene/P supercomputer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant