CN104090993B - Very-long baseline interference measurement relevant processing implementation method - Google Patents

Very-long baseline interference measurement relevant processing implementation method Download PDF

Info

Publication number
CN104090993B
CN104090993B CN201410240777.5A CN201410240777A CN104090993B CN 104090993 B CN104090993 B CN 104090993B CN 201410240777 A CN201410240777 A CN 201410240777A CN 104090993 B CN104090993 B CN 104090993B
Authority
CN
China
Prior art keywords
baseline
vlbi
signal
websites
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410240777.5A
Other languages
Chinese (zh)
Other versions
CN104090993A (en
Inventor
陈蓉
王静温
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Long March Launch Vehicle Technology Co Ltd
Beijing Institute of Telemetry Technology
Original Assignee
Aerospace Long March Launch Vehicle Technology Co Ltd
Beijing Institute of Telemetry Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Long March Launch Vehicle Technology Co Ltd, Beijing Institute of Telemetry Technology filed Critical Aerospace Long March Launch Vehicle Technology Co Ltd
Priority to CN201410240777.5A priority Critical patent/CN104090993B/en
Publication of CN104090993A publication Critical patent/CN104090993A/en
Application granted granted Critical
Publication of CN104090993B publication Critical patent/CN104090993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a very-long baseline interference measurement relevant processing implementation method. A VLBI (very-long baseline interference) relevant processing procedure is implemented by using a platform consisting of a CPU (central processing unit) and GPU (graphics processing unit) coprocessor based on MPI (message passing interface) and CUDA (compute unified device architecture) mixed parallel mode by a baseline parallel mode, so that a support is supplied to the application of the high-efficiency MPI+CUDA calculation mode to the VLBI relevant processing field. The parallel acceleration of the VLBI relevant processing procedure is effectively realized by the baseline parallel mode; the high-efficiency calculation capacity of a GPU and the high task distribution and scheduling capacity of a multi-core CPU are fully used; therefore, the running efficiency of the VLBI relevant processing procedure is improved, and the flexibility and the expandability of the implementation method are guaranteed through a heterogeneous platform and a mixed parallel mode.

Description

A kind of very long baseline interferometry(VLBI relevant treatment implementation method
Technical field
The present invention relates to a kind of very long baseline interferometry(VLBI relevant treatment implementation method, particularly a kind of based on mpi and The very long baseline interferometry(VLBI correlation process method of cuda hybrid parallel pattern, belongs to very long baseline interferometry(VLBI field.
Background technology
Very long baseline interferometry(VLBI (very long baseline interferometry, vlbi), is 60 years 20th century A kind of important radio interferometry technology got up for Later development.It is by entering to the observation data of multiple radio telescopes Row related operation, these telescopes are synthesized the synthesis telescope that equivalent diameter is Long baselines length.Vlbi adopts High stability atomic clock, as this vibrating system independent, is overcome the restriction of the length of base, has reached high space and divided with the time Resolution, is therefore widely used in fields such as astronomy, geodesy and surveies of deep space.
Relevant treatment is the core of vlbi data processing, has the characteristics that data-intensive and computation-intensive, realizes skill Art mainly has hardware correlation processing technique based on special IC or field programmable logic array (fpga) etc. and is based on The software correlation processing technique of universal computer platform.To realize vlbi relevant treatment using high-performance fpga needs exploitation special Hardware board, realize complicated, and resource-constrained, when needing to increase baseline amount, poor expandability.Using general purpose computer Although platform is realized vlbi relevant treatment and reduce realizing difficulty, improve autgmentability, universal computer platform parallel Disposal ability limited it is difficult to reply vlbi relevant treatment intensive calculations.
The later stage nineties 20th century so far, with the popularization of commercial high-performance computer system, based on modern high performance pc Or the vlbi software correlation processing technique of server platform has obtained the great attention of domestic and international each research institution, become vlbi skill The new study hotspot in art field.By high-performance pc or server platform combination are built cluster, and pass for cluster configuration information Pass interface (message passing interface, mpi) environment, it is possible to obtain higher calculating performance, meanwhile, calculate system One equipment framework (computer unifieddevice architecture, cuda) opens using gpu (graphics Processing unit, gpu) powerful calculating ability do general-purpose computations gate so that be based on cpu+gpu heterogeneous platform and mpi The efficient computation schema of+cuda hybrid parallel environment is possibly realized.
Content of the invention
The technology solve problem of the present invention is: overcomes the deficiencies in the prior art, provides a kind of very long baseline interferometry(VLBI phase Close and process implementation method, the method, with cpu+gpu small-scale cluster as platform, is realized based on mpi and cuda hybrid parallel pattern, The application in vlbi relevant treatment field for the efficient computation schema for mpi+cuda provides support.
The technical solution of the present invention is: a kind of very long baseline interferometry(VLBI relevant treatment implementation method, including following Step:
(1) build development platform using gpu and cpu, configuration on the platform calculates Unified Device architecture environment, in cpu Upper configuration information passing interface environment;
(2) cpu determines information needed passing interface simultaneously according to the baseline amount of its very long baseline interferometry(VLBI to be processed The quantity of traveling journey, and set up messaging interface concurrent process;
(3) cpu specifies corresponding very long baseline interferometry(VLBI baseline for each messaging interface process, starts simultaneously Each messaging interface process;
(4) each messaging interface process obtains the data of corresponding two websites of very long baseline interferometry(VLBI baseline File and Parameter File, the integer bit time delay of the signal sample data and two websites that obtain corresponding two websites of baseline is repaiied Positive time delay value, the length of delay of phase fringes rotation, the length of delay of decimal bit time delay correction and carrier wave frequency information;
(5) each messaging interface process is according to the signal of corresponding two websites of very long baseline interferometry(VLBI baseline The time delay value of the integer bit time delay correction of sampled data and two websites, using the calculating Unified Device framework ring on gpu Border, realizes the integer bit time delay correction of two website signals of baseline in a parallel fashion, and two website signals of baseline divide Not and down coversion local oscillation signal mixing, obtain the signal of two websites of baseline after integer bit time delay correction and down coversion;Its Middle down coversion local oscillation signal is calculated by the down coversion local frequency information in carrier wave frequency information;
(6) each messaging interface process, according to the length of delay of the phase fringes rotation of two websites of baseline, utilizes Calculating Unified Device architecture environment on gpu, the two website signals of baseline in a parallel fashion step (5) being obtained carry out phase Position striped rotation, makes two website signals close, obtains the signal of striped two websites of postrotational baseline;
(7) each messaging interface process is passed through to calculate two, the baseline that Unified Device framework obtains to step (6) The signal of website carries out parallel FFT, realizes for the signal of two websites of baseline being transformed into frequency domain from time domain;
(8) signal of two websites of baseline that each messaging interface process obtains according to step (7) and decimal The length of delay of bit time delay correction, using the calculating Unified Device architecture environment on gpu, realizes two, baseline in a parallel fashion The decimal bit time delay correction of website signal, obtains the signal of two websites of baseline after decimal bit time delay correction;
(9) each messaging interface process utilizes the calculating Unified Device architecture environment on gpu, in a parallel fashion The corresponding sampled point of two website signals of baseline after step (8) is processed carries out multiplication cross, and to the result after multiplication cross Summation is integrated parallel with reduction algorithm, is completed the cross-correlation operation of two website signals of baseline, obtain very long baseline and interfere Measurement cross-correlation operation result simultaneously exports, and this result is two website signals of baseline through very long baseline interferometry(VLBI relevant treatment Phase fringes data afterwards;
Each of described step (5) (9) messaging interface process is all only responsible for processing and this messaging interface The signal of two websites of very long baseline interferometry(VLBI baseline corresponding to process.
For a set of heterogeneous platform or cover heterogeneous platform shape using the development platform that gpu and cpu builds in described step (1) more The small-scale cluster becoming, wherein one gpu and cpu is bonded a set of heterogeneous platform.
The two website signals of baseline in a parallel fashion step (5) being obtained in described step (6) carry out phase fringes During rotation, or rotate two websites letters of baseline by the way of only each sampled point of rotation one website signal of baseline simultaneously The mode of number each sampled point, makes two website signals close.
Described cpu is multinuclear cpu server.
Compared with the prior art, the invention has the advantages that: the present invention is according to vlbi relevant treatment algorithm it is proposed that one Plant with cpu+gpu small-scale cluster as platform, based on the vlbi relevant treatment implementation method of mpi and cuda hybrid parallel pattern, The application in vlbi relevant treatment field for the efficient computation schema for mpi+cuda provides support, and the major advantage of the method is such as Under:
(1) it is easily achieved: need to develop special hardware board and firmware journey based on the hardware correlation processing technique of fpga Sequence, realizes difficulty greatly, performance period is long, and limited to the debugging acid of hardware, and means are limited, and the present invention is based on cpu+gpu The development platform of small-scale cluster is easily obtained, and utilizes cuda framework, can complete gpu is called under c language environment, Easily, debugging process is simple for written in code.
(2) speed of service is fast: only using universal computer platform software correlation processing technique although being also easy to realize, It is easy to debug, but it can only realize the parallel computation of cpu, quantity maximum, the parallel computation energy suitable with cpu quantity of concurrent process Power is well below gpu, and multinuclear cpu is combined with gpu coprocessor and builds heterogeneous platform by the present invention, makes full use of gpu efficient Floating number disposal ability and multinuclear cpu good task distribution and dispatching, played to greatest extent single calculate section The parallel processing capability of point.
(3) extensibility is strong: the hardware correlation processing technique based on fpga, because being constrained by hardware own characteristic, provides Source is restricted, and if as increase baseline amount, resource to be increased then needs to change fpga device and redesign hardware Board, poor expandability, and the present invention adopt multinuclear cpu be combined with gpu coprocessor the heterogeneous platform built and mpi and The hybrid parallel pattern of cuda is respectively provided with very strong extensibility, only need to increase in existing platform when needing and increasing baseline amount Plus calculate node.
Brief description
Fig. 1 is the schematic diagram that the multinuclear cpu that the present invention adopts is combined the heterogeneous platform built with gpu coprocessor;
Fig. 2 is a kind of schematic flow sheet of present invention very long baseline interferometry(VLBI relevant treatment implementation method;
Fig. 3 is the multi-threading parallel process process schematic realized based on method proposed by the present invention.
Specific embodiment
The later stage nineties 20th century so far, with the popularization of commercial high-performance computer system, based on modern high performance pc Or the vlbi software correlation processing technique of server platform has obtained the great attention of domestic and international each research institution, become vlbi skill The new study hotspot in art field.By high-performance pc or server platform combination are built cluster, and pass for cluster configuration information Pass interface (message passing interface, mpi) environment, it is possible to obtain higher calculating performance, meanwhile, calculate system One equipment framework (computer unified device architecture, cuda) opens using gpu (graphics Processing unit, gpu) powerful calculating ability do general-purpose computations gate so that be based on cpu+gpu heterogeneous platform and mpi The efficient computation schema of+cuda hybrid parallel environment is possibly realized.
The present invention is according to vlbi relevant treatment algorithm it is proposed that one kind, with cpu+gpu small-scale cluster as platform, is based on The vlbi relevant treatment implementation method of mpi and cuda hybrid parallel pattern, the Parallel Implementation mode of vlbi relevant treatment generally has Baseline is parallel, survey station is parallel, the structure such as channel parallel, time parallel, and the present invention adopts baseline parallel organization.
Mpi is one of standard of message-passing parallel program design, is the application programming interfaces api of a set of parallel computation, The present embodiment adopts mpi-2 standard;And mpi supports fortran language and c language simultaneously, the present embodiment adopts c language.
Cuda is a kind of parallel computation framework, is also a kind of programming model, and cuda makes gpu can solve the problem that the calculating of complexity is asked Topic, and enable cpu and gpu to complete parallel computation application using respective advantage is collaborative.
The present invention is implemented by way of from order line input order and parameter.Embodiment uses at multinuclear cpu and gpu association Reason device combines the heterogeneous platform built, and illustrates as shown in figure 1, and being implemented using the gpu based on kepler framework for a new generation and transport OK, the number of threads maximum that the gpu of each kepler framework can support can reach 32768.
A kind of schematic flow sheet of present invention very long baseline interferometry(VLBI relevant treatment implementation method is as shown in Figure 2.
It is illustrated in figure 3 multi-threading parallel process process schematic in embodiment, the embodiment of the present invention comprises the following steps:
(1) gpu and multinuclear cpu server are combined and build heterogeneous platform, a multinuclear cpu being furnished with gpu coprocessor Server is a set of heterogeneous platform, and the present embodiment is by two multinuclear cpu server (i.e. two sets of isomeries being furnished with gpu coprocessor Platform) combine to form small-scale cluster, build development platform, in this development platform, be every suit heterogeneous platform configuration cuda Environment, configures mpi environment on each multinuclear cpu server simultaneously, and to the multinuclear cpu server-assignment node in cluster Number, a cpu is selected on this development platform as host node, the present embodiment two is furnished with the multinuclear cpu clothes of gpu coprocessor Business device is two nodes, and node number is 0 and 1, and wherein one is host node, and node number is 0.
(2) the host node server in cluster reads from order line and starts order, and content includes baseline amount, number in order According to file relative path, Parameter File relative path and output file relative path.
Main cpu node in cluster is that each cpu node (including main cpu node) distribution will according to the baseline amount of vlbi The baseline amount processing, its corresponding mpi environment of each cpu node initializing, and determined according to its baseline amount to be processed The quantity of information needed passing interface concurrent process, sets up messaging interface concurrent process simultaneously.
The baseline amount of the present embodiment is 2, one baseline of each correspondence of two in cluster cpu node, each cpu node Complete the initialization of its corresponding mpi environment, set up a messaging interface concurrent process simultaneously.
(3) each cpu node specifies corresponding vlbi baseline for its corresponding mpi process, obtains its corresponding mpi process Process identification number in given communication domain, and baseline list is inquired about according to process identification number, baseline list is Array for structural body, The content of each array element comprises baseline numbering, baseline two site name and baseline two website code name, each cpu simultaneously Node starts corresponding mpi process function.
(4) each mpi process by the title of corresponding two websites of vlbi baseline obtaining from baseline list and code name and The data file relative path, Parameter File relative path and the output file relative path that obtain from order line are combined, and obtain Take the fullpath of the corresponding data file of baseline two website, Parameter File and output file, and read two, corresponding baseline The data file of website and Parameter File, obtain corresponding two website signal sample data of vlbi baseline and two website hits According to the time delay value of corresponding integer bit time delay correction, phase fringes rotation length of delay, decimal bit time delay correction delay The parameter information such as value and carrier frequency (including down coversion local frequency and radio frequency carrier frequency).
The cpu internal memory that the baseline two website signal data of acquisition and each parameter information are located by each mpi process from it Copy is so far in cpu corresponding gpu global memory.
(5) a cuda thread block realizing integer bit time delay correction on each mpi process initiation gpu, cuda Each of thread block thread is according to the signal sample data of this mpi process corresponding vlbi two websites of baseline and two The integer bit time delay value of website, calculates the sampling number of two website starting sample moment delays, then according to result of calculation Obtain two sampled points after two website integer bit time delay corrections of baseline alignment, and calculate the corresponding lower change of two sampled points Frequency local oscillation signal sampled point.
If website 1 is to postpone 0.5ms than website 2 integer bit time delay value when being embodied as, the sampling frequency of two website signals Rate is 50khz, then website 1 is to postpone 0.5*50000/1000=than the integer bit time delay spacing of website 2 initial time signal 25 sampled points, then the thread 0 in cuda thread block take the 26th sampled point of website 1 and the 1st sampled point of website 2, and And the acquisition corresponding down coversion of website 1 sampled point is calculated according to the down coversion local frequency value in gpu global memory and numerical value 26 Local oscillation signal sampled point, calculates according to the down coversion local frequency value in gpu global memory and numerical value 1 and obtains website 2 sampled point Corresponding down coversion local oscillation signal sampled point;Thread 1 takes the 27th sampled point of website 1 and the 2nd sampled point of website 2, and And the acquisition corresponding down coversion of website 1 sampled point is calculated according to the down coversion local frequency value in gpu global memory and numerical value 27 Local oscillation signal sampled point, calculates according to the down coversion local frequency value in gpu global memory and numerical value 2 and obtains website 2 sampled point Corresponding down coversion local oscillation signal sampled point, by that analogy;
Then the down coversion local oscillation signal sampled point that two sampled points are produced with thread is mixed by each thread respectively Frequently, result back into the correspondence position in gpu global memory.When being embodied as, in the thread 0 operation gpu overall situation in thread block Depositing the memory space that middle website 1 signal data memory block offset address is 0 and website 2 signal data memory block offset address is 0 Memory space, thread 1 operate gpu global memory in website 1 signal data memory block offset address be 1 memory space and station Point 2 signal data memory block offset address are 1 memory space, by that analogy;
In cuda thread block, all threads execute identical operation simultaneously, are realized to baseline two in the way of multi-threaded parallel The thick time delay correction of all effective sampling points of individual website signal, and be mixed with down coversion local oscillation signal respectively, obtain integer The signal of baseline two website after bit time delay correction and down coversion.
(6) on the basis of step (5), one on each the mpi process initiation gpu cuda line realizing striped rotation Journey block, axially another website signal frequency axle is close (actual using only rotating one website signal frequency of baseline for the present embodiment Two frequency axiss can also be made by the way of rotation two website signal frequency axles of baseline simultaneously close in application).
When being embodied as, website 2 is motionless, only rotates the signal frequency of website 1, then the thread 0 in thread block takes website 1 1st sampled point, thread 1 takes the 2nd sampled point of website 1, and by that analogy, then each thread obtains from gpu global memory Take the corresponding striped rotational latency value of website 1 signal sampling point handled by this thread, computed phase delay value, then use phase place Length of delay calculates the radio-frequency carrier signal sampled point of striped rotation, then by the website 1 signal sampling point handled by this thread and bar The radio-frequency carrier signal sampled point of stricture of vagina rotation is mixed, and then results back into the correspondence position in gpu global memory, specifically Implementation is: it is 0 deposit that thread 0 in thread block operates website 1 signal data memory block offset address in gpu global memory Storage space, thread 1 operates the memory space that website 1 signal data memory block offset address in gpu global memory is 1, with such Push away.
In cuda thread block, all threads execute identical operation simultaneously, are realized to baseline station in the way of multi-threaded parallel The phase fringes rotation of point 1 sampled point, makes the signal frequency of website 1 close to website 2, obtains two stations of baseline after striped rotation The signal of point.
(7) letter of two websites of vlbi baseline that each mpi process is obtained to step (6) by the cufft storehouse of cuda Number carry out parallel FFT, realize for the signal of two websites of vlbi baseline being transformed into frequency domain from time domain.
Function library cufft based on gpu being provided using nvidia when being embodied as, first by cufftplan1d () The one-dimensional cufft handle of function creation one, then using the signal of cufftexecc2c () function pair vlbi two websites of baseline Carry out parallel fft computing.
(8) a cuda thread block realizing decimal bit time delay correction on each mpi process initiation gpu, cuda Each thread of thread block obtains two frequency domain sample points of this mpi process corresponding vlbi two website signals of baseline, and Obtain the length of delay of two frequency domain sample point corresponding decimal bit time delay corrections of two website signals from gpu global memory, Then the time delay correction less than sampling time interval is carried out to two frequency domain sample points of two website signals, finally result is write Return the correspondence position in gpu global memory.
When being embodied as, website 1 signal data memory block skew ground in the thread 0 operation gpu global memory in thread block Location is 0 memory space and memory space that website 2 signal data memory block offset address is 0, in the thread 1 operation gpu overall situation Depositing the memory space that middle website 1 signal data memory block offset address is 1 and website 2 signal data memory block offset address is 1 Memory space, by that analogy.
In cuda thread block, all threads execute identical operation simultaneously, realize baseline two station in the way of multi-threaded parallel The time delay correction less than sampling time interval of the point each sampled point of frequency domain, baseline two website after acquisition decimal bit time delay correction Signal.
(9) a cuda thread block realizing cross-correlation operation on each mpi process initiation gpu, to step (8) place Baseline two website signal after reason carries out multiplication cross, and the result after multiplication cross is integrated parallel with reduction algorithm asks With.
When being embodied as, each of cuda thread block thread obtains two stations of this mpi process corresponding vlbi baseline Then two sampled points are multiplied, multiplied result are integrated with reduction algorithm by two frequency domain sample points of point signal Summation, and the imaginary part of the complex values after summation is divided by with real part.
In cuda thread block, all threads execute identical operation simultaneously, complete vlbi baseline in the way of multi-threaded parallel The cross-correlation operation of two website signals, and this result is copied to cpu internal memory by gpu global memory, and export, this result is exactly Phase fringes data after vlbi relevant treatment for the two website signals of baseline.
The present invention effectively achieves the parallel acceleration to vlbi correlation procedure with baseline parallel form, makes full use of Gpu efficient computing capability and the good task distribution of multinuclear cpu and dispatching, improve vlbi correlation procedure Operational efficiency, and motility and the autgmentability of implementation method is ensure that by heterogeneous platform and hybrid parallel pattern.
In the present invention, unspecified part belongs to general knowledge as well known to those skilled in the art.

Claims (4)

1. a kind of very long baseline interferometry(VLBI relevant treatment implementation method is it is characterised in that comprise the following steps:
(1) build development platform using gpu and cpu, configuration on the platform calculates Unified Device architecture environment, joins on cpu Put messaging interface environment;
(2) cpu determines information needed passing interface according to the baseline amount of its very long baseline interferometry(VLBI to be processed and advances The quantity of journey, and set up messaging interface concurrent process;
(3) cpu specifies corresponding very long baseline interferometry(VLBI baseline for each messaging interface process, starts each simultaneously Individual messaging interface process;
(4) each messaging interface process obtains the data file of corresponding two websites of very long baseline interferometry(VLBI baseline And Parameter File, obtain the signal sample data of corresponding two websites of baseline and the integer bit time delay correction of two websites Time delay value, the length of delay of phase fringes rotation, the length of delay of decimal bit time delay correction and carrier wave frequency information;
(5) each messaging interface process is according to the signal sampling of corresponding two websites of very long baseline interferometry(VLBI baseline The time delay value of the integer bit time delay correction of data and two websites, using the calculating Unified Device architecture environment on gpu, with Parallel form realizes the integer bit time delay correction of two website signals of baseline, and two website signals of baseline respectively with The mixing of frequency conversion local oscillation signal, obtains the signal of two websites of baseline after integer bit time delay correction and down coversion;Wherein lower change Frequency local oscillation signal is calculated by the down coversion local frequency information in carrier wave frequency information;
(6) length of delay that each messaging interface process rotates according to the phase fringes of two websites of baseline, using on gpu Calculating Unified Device architecture environment, the two website signals of baseline in a parallel fashion step (5) being obtained enter line phase bar Stricture of vagina rotates, and makes two website signals close, obtains the signal of striped two websites of postrotational baseline;
(7) each messaging interface process is passed through to calculate two websites of baseline that Unified Device framework obtains to step (6) Signal carry out parallel FFT, realize for the signal of two websites of baseline being transformed into frequency domain from time domain;
(8) signal of two websites of baseline that each messaging interface process obtains according to step (7) and decimal bit The length of delay of time delay correction, using the calculating Unified Device architecture environment on gpu, realizes two websites of baseline in a parallel fashion The decimal bit time delay correction of signal, obtains the signal of two websites of baseline after decimal bit time delay correction;
(9) each messaging interface process utilizes the calculating Unified Device architecture environment on gpu, in a parallel fashion to step Suddenly the corresponding sampled point of two website signals of the baseline after (8) are processed carries out multiplication cross, and to the result after multiplication cross to return About algorithm is integrated summation parallel, completes the cross-correlation operation of two website signals of baseline, obtains very long baseline interferometry(VLBI Cross-correlation operation result simultaneously exports, and this result is two website signals of baseline after very long baseline interferometry(VLBI relevant treatment Phase fringes data;
Each of described step (5) (9) messaging interface process is all only responsible for processing and this messaging interface process The signal of corresponding two websites of very long baseline interferometry(VLBI baseline.
2. a kind of very long baseline interferometry(VLBI relevant treatment implementation method according to claim 1 it is characterised in that: described It is the small-scale collection that a set of heterogeneous platform or many set heterogeneous platforms are formed using the development platform that gpu and cpu builds in step (1) Group, wherein one gpu and cpu is bonded a set of heterogeneous platform.
3. a kind of very long baseline interferometry(VLBI relevant treatment implementation method according to claim 1 it is characterised in that: described When the two website signals of baseline in a parallel fashion step (5) being obtained in step (6) carry out phase fringes rotation, using only The mode of each sampled point of rotation one website signal of baseline or simultaneously rotation two each sampled points of website signal of baseline Mode, make two website signals close.
4. a kind of very long baseline interferometry(VLBI relevant treatment implementation method according to claim 1 it is characterised in that: described Cpu is multinuclear cpu server.
CN201410240777.5A 2014-05-30 2014-05-30 Very-long baseline interference measurement relevant processing implementation method Active CN104090993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410240777.5A CN104090993B (en) 2014-05-30 2014-05-30 Very-long baseline interference measurement relevant processing implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410240777.5A CN104090993B (en) 2014-05-30 2014-05-30 Very-long baseline interference measurement relevant processing implementation method

Publications (2)

Publication Number Publication Date
CN104090993A CN104090993A (en) 2014-10-08
CN104090993B true CN104090993B (en) 2017-01-25

Family

ID=51638709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410240777.5A Active CN104090993B (en) 2014-05-30 2014-05-30 Very-long baseline interference measurement relevant processing implementation method

Country Status (1)

Country Link
CN (1) CN104090993B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105300437B (en) * 2015-11-05 2017-11-03 中国科学院上海天文台 A kind of VLBI baseband signals decimal time delay simulation method
CN105719231B (en) * 2016-01-19 2019-05-07 南京理工大学 A kind of interference data Fast Fourier Transform (FFT) method calculated based on GPU
CN107766291B (en) * 2017-09-15 2020-11-06 中国人民解放军63920部队 Method and computer equipment for obtaining residual time delay in very long baseline interferometry

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4444011B2 (en) * 2004-06-04 2010-03-31 株式会社 沖情報システムズ Remote control system
CN102201992A (en) * 2011-05-25 2011-09-28 上海理工大学 Stream processor parallel environment-oriented data stream communication system and method
CN102880785A (en) * 2012-08-01 2013-01-16 北京大学 Method for estimating transmission energy consumption of source code grade data directed towards GPU program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4444011B2 (en) * 2004-06-04 2010-03-31 株式会社 沖情報システムズ Remote control system
CN102201992A (en) * 2011-05-25 2011-09-28 上海理工大学 Stream processor parallel environment-oriented data stream communication system and method
CN102880785A (en) * 2012-08-01 2013-01-16 北京大学 Method for estimating transmission energy consumption of source code grade data directed towards GPU program

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
The Use of Very Long Baseline Interferometry for Time and Frequency Metrology;Yasuhiro Koyama;《MAPAN》;20120131;第27卷;第1893-1899页 *
VLBI并行处理方式比较分析;韩松涛等;《遥测遥控》;20130115(第1期);第29-33页 *
基于集群的深空测控系统设计策略;蔡季萍等;《无线电工程》;20140505(第5期);第44-47、55页 *
深空探测VLBI技术综述及我国的现状和发展;朱新颖等;《宇航学报》;20100830(第8期);第23-30页 *

Also Published As

Publication number Publication date
CN104090993A (en) 2014-10-08

Similar Documents

Publication Publication Date Title
CN202041640U (en) Satellite navigation software receiver based on GPU
CN103278829B (en) A kind of parallel navigation method for tracing satellite signal based on GPU and system thereof
CN104090993B (en) Very-long baseline interference measurement relevant processing implementation method
CN104850866A (en) SoC-FPGA-based self-reconstruction K-means cluster technology realization method
CN104570081A (en) Pre-stack reverse time migration seismic data processing method and system by integral method
CN103969627A (en) Ground penetrating radar large-scale three-dimensional forward modeling method based on FDTD
CN105227259B (en) A kind of parallel production method of M sequence and device
Wang et al. Reconfigurable hardware accelerators: Opportunities, trends, and challenges
CN105911532A (en) Synthetic aperture radar echo parallel simulation method based on depth cooperation
Gailing et al. Germany’s Energiewende and the spatial reconfiguration of an energy system
CN116436012B (en) FPGA-based power flow calculation system and method
CN103914428A (en) Efficient communication method of structural analysis under multi-core distributed computing environment
CN102902590A (en) Parallel digital terrain analysis-oriented massive DEM (Digital Elevation Model) deploying and scheduling method
CN112446471B (en) Convolution acceleration method based on heterogeneous many-core processor
CN113672380B (en) Phase interferometer direction-finding system for realizing FX cross-correlation phase discrimination by GPU and phase discrimination method thereof
CN106093884A (en) A kind of manifold relevant treatment implementation method of based on FPGA of improvement
CN103837878A (en) Method for acquiring GNSS satellite signal
CN103176949A (en) Circuit and method for achieving fast Fourier transform (FFT) / inverse fast Fourier transform (IFFT)
CN103400354B (en) Based on the remotely sensing image geometric correction method for parallel processing of OpenMP
CN107423030A (en) Markov Monte carlo algorithm accelerated method based on FPGA heterogeneous platforms
Chen et al. Domain decomposition approach for parallel improvement of tetrahedral meshes
CN112559197A (en) Convolution calculation data reuse method based on heterogeneous many-core processor
Svensson Occam‐pi for Programming of Massively Parallel Reconfigurable Architectures
CN113672541A (en) PCM/FM telemetering signal incoherent demodulation implementation method based on GPU
CN102608600A (en) FPGA (field-programmable gate array)-based step frequency image splicing implementation method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant