CN115309555A

CN115309555A - Parallel computing method and system for satellite, storage medium and equipment

Info

Publication number: CN115309555A
Application number: CN202210946245.8A
Authority: CN
Inventors: 李昭男; 曾伟刚; 李琮; 董卫华; 张治国
Original assignee: Xi'an Zhongke Tianta Technology Co ltd
Current assignee: Xi'an Zhongke Tianta Technology Co ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-11-08
Anticipated expiration: 2042-08-08
Also published as: CN115309555B

Abstract

The invention belongs to a parallel computing method, and aims to solve the technical problems that when the satellite data is computed at present, an integrated resolving method is adopted, the single computing operation is complex, the occupied memory space is large, when the number of satellites in parallel computing reaches a certain number, an overflow error occurs due to insufficient cache, and resolving failure is caused.

Description

Parallel computing method and system for satellite, storage medium and equipment

Technical Field

The invention belongs to a parallel computing method, and particularly relates to a parallel computing method and system for a satellite, a computer readable storage medium and terminal equipment.

Background

The construction of a low-orbit satellite constellation communication system capable of covering the whole world is a necessary way for the development of the aerospace industry. By 2017, man-made targets on the near-earth space orbit have been as many as 18000. Meanwhile, with the development of the small satellite technology, the number of satellites included in the low-orbit constellation network is continuously increased, and the number of low-orbit broadband constellations planned to be transmitted by the SpaceX reaches even more than 4000.

At present, when satellite data is calculated, serial operation is often performed by using an SGP4/SDP4 algorithm based on a central processing unit, and although the performance of the central processing unit is rapidly improved in recent decades, the calculation by using the central processing unit is still very time-consuming in the face of increasing satellite quantity. For example, it takes approximately four minutes to calculate the position data of 200 satellites every second for 24 hours. Therefore, with the increasing number of satellites, the use of a central processor for calculations is no longer applicable. Aiming at the problem, in the calculation and analysis of low-orbit satellite constellation coverage performance based on graphic processor acceleration, sunday et al adopt a modularized acceleration method and a CUDA library provided by Yingweida corporation, and a matched graphic processor performs an acceleration test on an SGP4/SDP4 orbit model. Kong Fanze et al propose two graphics processor acceleration methods, an integrated solution method and a modular solution method, in "graphics processor integrated parallel acceleration method for satellite orbit recursion". The integrated resolving method is characterized in that an SGP4 resolving model (including initial constant definition) is added into a kernel function computing process as a whole, a computer memory only needs to perform one-time interactive data transmission and one-time kernel function calling and video memory space distribution with a graphics processor memory, time of multiple parameter interaction and function calling is saved compared with a modularized method, but the integrated resolving method is complex in single computing operation, large in video memory space occupation, and when the number of stars in parallel computing reaches a certain number, overflow errors can occur due to insufficient cache, and resolving failure is caused. The modular calculation method divides the calculation process into 4 modules, namely a gravity perturbation constant initialization module, a double-row element and orbit determination parameter conversion module, an SGP4 orbit model initialization module and an SGP4 orbit prediction module. Except for the gravity perturbation constant initialization module, the other 3 modules perform parallel computation in the graphics processor, the sub-module computation has small use requirement on video memory, and large-scale satellite orbit parallel computation can be realized.

Disclosure of Invention

The invention provides a parallel computing method and system for a satellite, a computer readable storage medium and terminal equipment, and aims to solve the technical problems that when the satellite data is computed at present, an integrated computing method is adopted, the operation of single computing is complex, the occupied display space is large, and when the number of satellites in parallel computing reaches a certain number, the computing fails due to overflow errors caused by insufficient cache, and a modular computing method is adopted, a large amount of time is consumed in the parameter interaction process between a central processing unit and a graphic processor, and the acceleration effect is not ideal.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

a parallel computing method for satellites, characterized by comprising the following steps:

s1, performing disturbance data initialization processing in a central processing unit to obtain a disturbance data initialization result;

s2, reading TLE data through a central processing unit, and extracting a time period, a stepping value and a TLE data column which need to be calculated;

s3, taking a single satellite orbit parameter from the TLE data column through a central processing unit for initialization;

s4, copying a disturbance data initialization result, the single satellite orbit parameters initialized in the step S3, time period information needing to be calculated and step value information to a graphic processor;

s5, ephemeris calculation and conversion under different time nodes are respectively executed through a plurality of calculation units in the graphic processor, and calculation results are obtained;

s6, reading and storing the calculation result of the step S5 through a central processing unit; and judging whether ephemeris calculation and conversion of all satellites are finished, if so, finishing parallel calculation, and otherwise, returning to the step S3 until ephemeris calculation and conversion of all satellites are finished.

Further, in step S5, the computing units are in one-to-one correspondence with the time nodes.

Further, the storing in step S6 and the step S3 are respectively executed in two threads of the central processor.

Further, the video memory of the graphics processor comprises a constant memory and a global memory;

in step S4, the disturbance data initialization result is stored in a constant memory; other data in the graphics processor are stored in the global memory.

Furthermore, the central processing unit calls class methods in the running process;

and when the graphics processor calls the function in the running process, deleting the function table pointer, and transmitting the class object in the class method as a structural body independently.

Further, in step S5, the data type of the time information and the coordinate information in the calculation result is a structure type of an array.

Further, in step S5, the performing, by the plurality of computing units in the graphics processor, ephemeris calculation and conversion at different time nodes respectively is specifically that the double-precision floating point number form is degraded to the single-precision floating point number form by trigonometric function calculation in the ephemeris calculation and the conversion at different time nodes.

Aiming at the parallel computing method, the invention provides a parallel computing system for a satellite, which is used for realizing the parallel computing method for the satellite and is characterized by comprising a central processing unit and a graphic processing unit;

the central processing unit is used for performing disturbance data initialization processing to obtain a disturbance data initialization result, reading TLE data, initializing a single satellite orbit parameter in the TLE data, copying the disturbance data initialization result and the initialized single satellite orbit parameter to the graphic processing unit, reading and storing the calculation results of each calculation unit, and judging whether ephemeris calculation and conversion of all satellites are finished;

and the graphic processor is used for respectively executing ephemeris calculation and conversion under different time nodes through a plurality of calculation units in the graphic processor.

The invention also provides a computer-readable storage medium, on which a computer program is stored, which is characterized in that the program realizes the steps of the above-mentioned method when being executed by a processor.

In addition, the invention also provides a terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the processor implements the steps of the method when executing the computer program.

Compared with the prior art, the invention has the following beneficial effects:

1. compared with the existing CPU serial double-layer circular calculation method, the parallel calculation method for the satellite provided by the invention has the advantages that the ephemeris calculation and conversion which need to be circularly calculated are completed by a plurality of calculation units of the GPU, and logically, the ephemeris calculation and conversion of a single satellite which originally needs to be circularly calculated for many times on the CPU can be completed only by one-time calculation, so that the aim of accelerating calculation pairs is fulfilled.

2. In the invention, the disturbance data is initialized and TLE data is read in the CPU, so that the condition that a plurality of calculation units of the GPU perform the same calculation on the same data is avoided, and the memory space and the calculation resources of the CPU are saved.

3. The storage of the GPU calculation result and the initialization processing of the single satellite orbit parameter in the CPU are respectively executed in the two threads, so that the CPU is prevented from stagnation and not working during storage, and the execution speed is further improved.

4. The GPU is provided with a constant memory which is used for storing a disturbance data initialization result, deleting a function table pointer in the operation process of the GPU, setting the time information and coordinate information data type in the calculation result in the GPU as an array structure type, and degrading a double-precision floating point number form into a single-precision floating point number form when the GPU executes trigonometric function calculation, so that the calculation speed can be greatly improved.

5. The invention also provides a computer readable storage medium and terminal equipment capable of executing the steps of the method, and the method can be popularized and applied to realize fusion on corresponding hardware equipment.

Drawings

FIG. 1 is a schematic flow chart of a serial double-layer loop calculation method of a conventional CPU;

FIG. 2 is a flowchart illustrating an embodiment of a parallel computing method for satellites according to the present invention;

FIG. 3 is a diagram illustrating an AoS data structure according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an SoA data structure in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

As shown in fig. 1, a flow diagram of a serial calculation method of an existing central processing unit is shown, a serial double-layer loop calculation method is adopted, after calculation is started, disturbance data is initialized, TLE data is read, a single satellite is taken to initialize orbit parameters, an SGP4/SDP4 orbit model is initialized, coordinates are calculated through the SGP4/SDP4 orbit model, ephemeris conversion is performed, if the current step number is less than the total step number, the coordinates are recalculated through the SGP4/SDP4 orbit model, if the current step number is greater than or equal to the total step number, whether the satellite number is less than the total satellite number is determined, if the satellite number is less than the total satellite number, a next single satellite is continuously taken to initialize the orbit parameters, calculation is repeated, and if the satellite number is equal to the total satellite number, calculation is ended.

The number of steps mentioned above specifically refers to the number of steps, and the step refers to the time span of each calculation, for example, a step of 1 means calculating the position data of each second, and a step of 2 means calculating the position data of every two seconds. The total number of steps refers to all the time nodes to be calculated.

Compared with the serial double-layer circular computation of the CPU, the GPU parallel computation method for the satellite, which is designed by the invention, has the core idea that: and (3) the coordinate data of the multiple time nodes which are required to be circularly calculated originally are respectively sent to a plurality of calculating units of the GPU, and each calculating unit respectively calculates ephemeris calculation and conversion under different time nodes. Therefore, logically, the ephemeris calculation and the conversion of a single satellite which originally needs multiple times of circular calculation on the CPU can be completed only by once calculation, so that the effect of accelerating the calculation is achieved.

As shown in fig. 2, the parallel computing method for satellites according to the present invention includes the following specific steps:

(1) Reading the track disturbance data by a CPU (central processing unit) and analyzing and initializing, wherein the specific process comprises the following steps: reading a text file stored with track disturbance data, converting the read text into a data structure which can be used for later calculation and storing the data structure in an internal memory;

(2) A TLE (Two-line element set) data file is read by a CPU, and a time period, a stepping value and a TLE data column which need to be calculated are extracted. And then ready to begin the loop portion of the computation.

(3) And extracting TLE data of one satellite from the extracted TLE data column, analyzing the data and finishing initialization. The initialization here is that since the TLE data of a single satellite extracted from the column of extracted TLE data is the number of orbits used for storage, revising and calculating are required to obtain the available orbit parameters. The algorithm for how to initialize can be done by using the existing correlation algorithm.

(4) Copying a result of the initialization of the disturbance data, the initialized single satellite orbit parameters, the time period to be calculated and the stepping value extracted from the TLE data file to a GPU;

(5) Ephemeris calculation and conversion are completed by a GPU (graphic processing unit), and a plurality of calculation units in the GPU execute the ephemeris calculation and conversion under different time nodes. One computing unit may perform ephemeris calculation and conversion at one time node, or one computing unit may perform ephemeris calculation and conversion at multiple time nodes, and in specific implementation, allocation of time nodes is completed through a video memory driver in the GPU, and the time nodes are evenly allocated to each computing node.

(6) And (3) reading and storing the calculation results after the ephemeris calculation and conversion are finished by the GPU, judging whether the ephemeris calculation and conversion of all satellites are finished or not, if so, finishing parallel calculation, otherwise, returning to the step (3), continuously taking out TLE data of one satellite, analyzing the data, finishing initialization, and then continuously calculating until the ephemeris calculation and conversion of all satellites are finished, the data are completely stored, and exiting the calculation.

When the storage is performed in the CPU, the method may be performed asynchronously in two threads, with the method (3) in which TLE data of one satellite is taken out, the data is analyzed, and the initialization is completed. Specifically, the calculation result in the GPU is stored in a buffer area of a CPU memory, the CPU synchronously extracts TLE data of one satellite in a first thread while storing in a second thread, analyzes the data and completes initialization, and the asynchronous storage mode avoids the phenomenon that the process is stopped and does not work when the CPU waits for the calculation result, solves the problem that overlarge display memory is needed, and can complete multi-satellite calculation only by few display memories.

In the method, firstly, the analysis initialization process of the disturbance data, the track parameters and the like is completed by the CPU, so that the condition that a plurality of calculation units of the GPU perform the same calculation on the same data is avoided, and certain GPU memory space and calculation resources are saved. Secondly, the method returns the result of each calculation to the CPU and carries out asynchronous storage, and stores the last calculation result while waiting for the GPU to finish the calculation, thereby avoiding the phenomenon that the process is stopped and does not work when the CPU waits for the calculation result, simultaneously solving the problem that overlarge video memory is needed, and completing multi-satellite calculation only by needing few video memories.

However, returning the calculation result data to the CPU after each calculation is completed may cause frequent data exchange between the CPU and the GPU. In most cases, the memory space of the CPU and the display memory of the GPU are connected via a PCIe (Peripheral Component Interconnect Express) bus, and frequent and large data exchange requires a certain time and bus bandwidth. In addition, through performance analysis, the data read-write performance of the calculation method is not high, and particularly, more conflicts occur during memory access. In addition, a large number of trigonometric functions are calculated in the ephemeris calculation and conversion process, and the double-precision floating point type trigonometric function operation is very time-consuming.

Aiming at the problems, the invention also designs a plurality of optimization methods to accelerate the operation speed of the parallel computing method.

(1) Setting constant memory

The constant memory is characterized in that only one piece of data exists in the video memory space. When multiple threads access the same address of the constant memory, only one chip read-write cycle is needed, and then the data is sent to all threads in a broadcasting mode. The read-write cycle herein refers to a minimum time interval required for performing two consecutive read-write operations on the memory chip, and since some memories need a certain recovery time after one access operation, the access cycle is usually greater than or equal to the access time, but if multiple consecutive reads and writes are required, a whole read-write cycle must be spaced between each read and write.

The global memory is characterized in that when multiple threads access different addresses of the global memory (the addresses are not in the same granularity), no conflict exists, and the performance is best. When multiple threads access the same address of the global memory, a conflict occurs and the threads need to queue for reading. At this moment, data is only read as parameters without modification, and if the same data is copied for each thread, the memory consumption is extremely high, so that when the disturbance data initialization result, the initialized single satellite orbit parameter, the time period information needing to be calculated and the stepping value information are copied to the graphics processor, the disturbance data initialization result is stored in the constant memory, other data are stored in the global memory, and the disturbance data initialization result is stored in the constant memory, so that the memory resources can be effectively saved.

(2) Deleting inefficient data parameters

In the existing CPU algorithm, the calculation process is completed inside the class object, so the class method can be called without transferring a pointer. However, in the acceleration algorithm introduced into the GPU, in order to use the class method, it is necessary to transmit the pointer to the GPU, and at the same time, it is necessary to copy the entire class object to the video memory of the GPU, including the function table pointer of the class object, which wastes a certain amount of memory resources. Therefore, the function table pointer of the class object is deleted, the required data is separately transmitted as a structural body, and the class method is extracted and used as a common function call, so that the memory consumption can be saved.

(3) Converting AoS to SoA

AoS refers to an array of structures, the form of which is shown in fig. 3. SoA refers to the structure of an array, the form of which is shown in FIG. 4. On the GPU, the SoA has better read-write performance than the AoS because the number of read cycles spent by the memory chip is much smaller than the number of cycles required for the AoS to access each time the SoA accesses the memory. When the structure is used as an array, that is, aoS, variables in the same structure (for example, aoS [2 ]) are stored in adjacent memory spaces, and when adjacent data are in the same read-write granularity, read-write of the adjacent variables at the same time collide, threads need to wait in a queue, and the chip read-write cycle is increased. When the array forms a structure body, namely the SoA, the data originally positioned at the adjacent positions in the AoS are far apart in the SoA, and when the data of the same subscript is accessed, the conflict is avoided, and the access can be completed only by few read-write cycles. The method can greatly improve the read-write performance of the algorithm.

The calculation result obtained by performing ephemeris calculation and conversion in the GPU contains time and coordinate information, and SoA optimization can be performed on a data structure of the time and the coordinate.

(4) Conversion of Double form operation to float form operation

When the GPU executes ephemeris calculation and conversion, a large number of more complex logical operations such as trigonometric functions are involved, the operation speed is low, and the speed can be improved through degradation of Double-precision floating point numbers. Since high-precision computation is unavoidable in some places, only double-precision degraded single-precision computation is performed on trigonometric function computation in a specific function. According to the calculation result optimized by the method, the error is only reflected on the last digit after the decimal point, the numerical difference of the last digit compared with the last digit before accuracy is not reduced is only 1, and the error occurrence probability is less than 1 percent and is within an acceptable range.

Based on the computing method, the invention also provides a parallel computing system for the satellite, which can realize the parallel computing method for the satellite and comprises a central processing unit and a graphic processing unit. The central processing unit is used for carrying out disturbance data initialization processing to obtain a disturbance data initialization result, reading TLE data, taking single satellite orbit parameters in the TLE data for initialization, copying the disturbance data initialization result and the initialized single satellite orbit parameters to the graphic processing unit, reading and storing calculation results of each calculation unit, and judging whether ephemeris calculation and conversion of all satellites are finished or not; and the graphics processor is used for respectively executing ephemeris calculation and conversion under different time nodes through a plurality of calculation units in the graphics processor.

In the parallel computing system, the optimization can be performed inside the central processing unit and the graphic processing unit, so that better computing speed and computing effect can be achieved.

The parallel computing method is combined with the optimization method to carry out actual test, in the actual test, the RTX3070 display card is used to calculate the duration of the star with the step of 1 within 24 hours of 8000 stars, the duration can be accelerated from the original 110 seconds to the optimized 20 seconds, and the effect is obvious.

The parallel computing method of the present invention can be applied to a computer-readable storage medium in which a computer program is stored, and the parallel computing method can be stored as a computer program in the computer-readable storage medium, and the computer program realizes the steps of the parallel computing method when executed by a processor.

In addition, the parallel computing method of the present invention may also be applied to a terminal device including a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the parallel computing method of the present invention are implemented. The terminal device here may be a computer, a notebook, a palm computer, and various computing devices such as a cloud server, and the processor may be a general processor, a digital signal processor, an application specific integrated circuit, or other programmable logic devices.

In conclusion, the invention improves and optimizes the SGP4/SDP4 algorithm calculated by using the CPU into an acceleration algorithm calculated by using the GPU in parallel. Compared with the calculation result of the existing CPU, the algorithm of the invention can still achieve the speed-up ratio of 40-70 on the common PC equipment even under the condition that the CPU uses multi-core optimization. The whole algorithm process is completely controllable, the calculation result is reliable, and the method has important significance for track calculation and aerospace career.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A parallel computing method for a satellite, comprising the steps of:

s3, a central processing unit is used for taking single satellite orbit parameters from the TLE data column for initialization;

2. A parallel computing method for satellites according to claim 1, characterized in that: in step S5, the calculation units correspond to the time nodes one to one.

3. A parallel computing method for satellites according to claim 1, characterized in that: the storage in step S6 and step S3 are executed in two threads of the central processor, respectively.

4. A parallel computing method for satellites according to claim 1, characterized in that: the video memory of the graphics processor comprises a constant memory and a global memory;

5. A parallel computing method for satellites according to claim 1, characterized in that:

the central processing unit calls the class method in the running process;

6. A parallel computing method for satellites according to claim 1, characterized in that: in step S5, the data type of the time information and the coordinate information in the calculation result is the structure type of the array.

7. A parallel computing method for satellites according to claim 6, characterized in that: in step S5, the ephemeris calculation and the conversion at different time nodes are respectively executed by the plurality of calculation units in the graphics processor, specifically, the ephemeris calculation and the trigonometric function calculation in the conversion at different time nodes both degrade the double-precision floating point form to the single-precision floating point form.

8. A parallel computing system for a satellite for implementing a parallel computing method for a satellite according to any one of claims 1 to 7, characterized in that: comprises a central processing unit and a graphic processor;

9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implementing the steps of the method according to any one of claims 1 to 7.

10. A terminal device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that: the processor, when executing the computer program, realizes the steps of the method according to any of claims 1 to 9.