CN110543361A - Astronomical data parallel processing device and method - Google Patents

Astronomical data parallel processing device and method Download PDF

Info

Publication number
CN110543361A
CN110543361A CN201910693839.0A CN201910693839A CN110543361A CN 110543361 A CN110543361 A CN 110543361A CN 201910693839 A CN201910693839 A CN 201910693839A CN 110543361 A CN110543361 A CN 110543361A
Authority
CN
China
Prior art keywords
astronomical data
data processing
parallel
astronomical
processing program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910693839.0A
Other languages
Chinese (zh)
Other versions
CN110543361B (en
Inventor
李长华
崔辰州
李正
韩叙
和兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Astronomical Observatories of CAS
Original Assignee
National Astronomical Observatories of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Astronomical Observatories of CAS filed Critical National Astronomical Observatories of CAS
Priority to CN201910693839.0A priority Critical patent/CN110543361B/en
Publication of CN110543361A publication Critical patent/CN110543361A/en
Application granted granted Critical
Publication of CN110543361B publication Critical patent/CN110543361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

the invention discloses an astronomical data parallel processing device and method, wherein the processing device calculates a cluster, and the method comprises the following steps: the system comprises a management server, a storage server and a plurality of computing servers, wherein the storage server is used for storing a plurality of astronomical data files, parameter files and astronomical data processing instructions, and the instructions are executed by the computing servers at the same time. The method comprises the following steps: the management server runs a starting module; the starting module distributes the parallel module to a plurality of computing servers for operation and distributes a task number for the astronomical data processing program; the parallel module starts an astronomical data processing program, and simultaneously extracts parameters from a parameter file in the storage server according to the task number and inputs the parameters into the astronomical data processing program; and the astronomical data processing program processes the astronomical data file in the storage server according to the parameters and saves the result in the storage server.

Description

Astronomical data parallel processing device and method
Technical Field
The invention relates to the field of astronomical data processing, in particular to an astronomical data parallel processing device and method.
Background
With the continuous construction and precision upgrade of astronomical observation equipment, the acquisition capacity of astronomical data is greatly enhanced, astronomical research enters a big data era, the original astronomical data processing program cannot meet the time requirement of big data processing, and the large-scale parallel becomes a necessary means for accelerating astronomical data processing.
the HPC (High Performance Computing System) is a main environment of parallel Computing, and is characterized in that a plurality of Computing servers with the same architecture form a Computing cluster through a High-speed network, and then, Computing tasks are distributed on different Computing servers by parallel processing software, so that the Computing tasks are executed in parallel. Therefore, parallel computing software is an important component in addition to necessary computing hardware for implementing parallel computing tasks, and development of parallel software in the HPC environment is currently performed based on MPI (Message Passing Interface).
MPI is a standard protocol interface for parallel program development in an HPC environment, a basic framework for parallel program development and a data interaction mode between subprocesses are designed, various implementation forms such as openMPI, Intel MPI, MPICH and the like exist at present, but the implementation principles are consistent, different processes need to execute different computing tasks or correspond to different input data in a large-scale data processing environment, at the moment, the common method needs to perform parallel design again according to the framework of MPI on the basis of the original program, and different processes execute different actions according to different process numbers, so that correct parallel execution is realized in the HPC environment. Otherwise, although the multiple processes are started, the executed commands are the same, and the purpose of processing data in parallel cannot be achieved.
The parallelization transformation of the original serial program is a very complicated technical work, on one hand, for the use of the program, the transformation can not be carried out due to lack of source codes or unfamiliarity to the source codes, and on the other hand, even under the condition of the source codes, the change of the original program flow can be caused after the parallelization transformation, and the value shift of the program under different computing environments is not convenient.
Disclosure of Invention
Technical problem to be solved
The invention relates to a device and a method for parallel processing of astronomical data, which at least partially solve the defects of low speed and low efficiency of serial processing of astronomical data in the existing method.
(II) technical scheme
According to an aspect of the present invention, there is provided an apparatus for parallel processing of astronomical data, comprising: the computing cluster comprises a management server, a storage server and a plurality of computing servers; the storage server is used for storing a plurality of astronomical data files, parameter files and astronomical data processing instructions, the instructions are executed by a plurality of computing servers at the same time, and the execution comprises the following steps: simultaneously extracting parameters from the parameter file; and initiating an astronomical data processing task, and distributing the astronomical data to a plurality of computing servers in parallel for operation.
in a further aspect, the management server, the storage server, and the computing server are connected via an ethernet network and operate in the same network segment.
According to another aspect of the present invention, there is also provided a method for parallel processing of astronomical data, comprising: operating a starting module; distributing the parallel module to a plurality of computing servers for operation, and distributing a task number for the astronomical data processing program; starting an astronomical data processing program, extracting parameters from a parameter file in a storage server according to the task number, and inputting the parameters into the astronomical data processing program; and processing the astronomical data file in the storage server according to the parameters, and storing the result.
In a further aspect, the astronomical data processing program runs in the plurality of compute servers simultaneously.
In a further aspect, the parallel module runs in the plurality of computing servers simultaneously, extracts parameters from the parameter file and starts an astronomical data processing program, and the parallel module complies with the MPI standard specification.
In a further aspect, the starting module, running on the management server, initiates an astronomical data processing task and distributes the parallel modules to a plurality of computing servers for running.
In a further aspect, the parameter file is in a text file format, and each row includes a task number and an astronomical data file allocated to an astronomical data processing program corresponding to the task number.
(III) advantageous effects
The invention redesigns the flow of the parallel framework, adds the parallel module as the middleware of the MPI framework and the astronomical data processing program, inputs different parameters for the astronomical data processing program in different computing servers, so that the astronomical data processing program has the parallel computing capability, and simultaneously, a user can realize large-scale parallel execution without modifying the serial code of the original astronomical data processing program, thereby improving the data processing efficiency.
In addition, the problem of parallelization of the astronomical data processing program which is a passive code or a complex code and can not be subjected to parallelization modification is solved.
Drawings
Fig. 1 is a diagram of a computing cluster structure of an astronomical data parallel processing apparatus according to an embodiment of the present invention.
Fig. 2 is a flowchart of an astronomical data parallel processing method according to an embodiment of the present invention.
[ description of reference ]
1. A management server; 2. a storage server; 3. computing server
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
In the present invention, "disposed on" or "attached to" is used to include a direct contact relationship with a single or multiple components. Furthermore, the use of ordinal numbers such as "first," "second," "first," or "second," etc., in the description and in the claims to modify a claimed element, does not by itself connote any preceding ordinal number, nor is the order in which a particular element is presented or the order in which it is manufactured, but are used merely to distinguish one element having a certain name from another element having a same name. In a large-scale data processing environment, different processes need to execute different computing tasks or correspond to different input data, and at this time, a common method needs to perform parallel design again based on an MPI frame on the basis of an original program, and different processes execute different actions according to different process numbers, so that the processes are executed correctly and parallelly in the HPC environment, otherwise, although multiple processes are started, the executed commands are the same, and the purpose of processing data in parallel cannot be achieved.
fig. 1 is a computing cluster structure diagram of an astronomical data parallel processing apparatus according to an embodiment of the present invention, and as shown in fig. 1, the computing cluster structure diagram includes a management server 1, a storage server 2, and a plurality of computing servers 3.
the storage server 2 is configured to store a plurality of astronomical data files, parameter files and astronomical data processing instructions, which are executed by a plurality of the computing servers 3 at the same time, and when executed, the instructions include the following steps:
Simultaneously extracting parameters from the parameter file; and initiating an astronomical data processing task, and distributing the astronomical data to a plurality of the computing servers 3 in parallel for operation.
In this embodiment, the management server 1, the storage server 2, and the computation server 3 are connected via an ethernet network and operate in the same network segment.
the present invention further provides a method for parallel processing of astronomical data, and fig. 2 is a flowchart of a method for parallel processing of astronomical data according to an embodiment of the present invention, as shown in fig. 2, including:
operating a starting module;
Distributing the parallel modules to a plurality of computing servers 3 for operation, and distributing a task number for the astronomical data processing program;
Starting an astronomical data processing program, extracting parameters from a parameter file in the storage server 2 according to the task number, and inputting the parameters into the astronomical data processing program;
And processing the astronomical data file in the storage server according to the parameters, and keeping the result.
In the present embodiment, the astronomical data processing program runs in the plurality of calculation servers 3 at the same time; the parallel module runs in the plurality of computing servers 3 simultaneously, extracts parameters from the parameter file and starts an astronomical data processing program, and meanwhile, the parallel module complies with MPI standard specifications; the starting module runs in the management server 1, and initiates an astronomical data processing task and distributes the parallel modules to a plurality of the computing servers 3 for running.
In addition, the parameter file is in a text file format, and each line comprises a task number and an astronomical data file distributed by an astronomical data processing program corresponding to the task number.
An astronomical data parallel processing method of the present invention is further described below with reference to specific embodiments, in the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. It will be appreciated by those skilled in the art that the following specific details are not to be construed as limiting the invention.
In an exemplary embodiment of the present invention, the number of the computing servers 3 is 60, the storage server 2 includes 60 hard disks, each of which has a capacity of 10T, and is mounted in the setting directory of the 60 computing servers 3 through a network file system.
the astronomical data files are spectral observation data, total 1680 files, the astronomical data processing program is a spectral parameter extraction program, the parameter files contain parameters required by the spectral parameter extraction program and are stored in the storage server 2, the parallel module acquires astronomical data parameters, namely spectral parameters, from corresponding lines of the parameter files according to task numbers of the astronomical data processing program, and the astronomical data processing program starts to process the astronomical data files in the storage server 2 according to the parameters.
The parallel module is responsible for analyzing the parameter files and inputting different parameters into the same astronomical data processing program, and the astronomical data processing programs run in different computing servers 3 to achieve the parallel effect.
in this embodiment, the command for operating the astronomical data processing module is:
mpirun-np 1680-hosts cu01,cu02,cu03,cu04,...cu60-ppn 28/opt/software/ scaleMpi execfile-f filelist
Wherein, the mpirun is a starting module operation command;
"cu 01, cu02, cu03, cu04, · cu 60" is a compute server 3 identifier;
"scaleMpi" is a parallel module;
"execfile" is an astronomical data processing program;
"filelist" is a parameter file;
"-np 1680" refers to 1680 astronomical data files to be processed.
In this embodiment, the parallel module inputs different parameters for the astronomical data processing program in different computing servers 3, so that the serialized astronomical data processing program processes different astronomical data files at the same time, and further has the capability of parallel computing, and a user can realize large-scale parallel execution without modifying the serial code of the original astronomical data processing program, thereby improving the data processing efficiency.
In other embodiments of the present invention, a job management System, such as a PBS (Portable Batch System), may be further configured in the management server 1, and the job management System may set a plurality of astronomical data processing tasks as a task queue, and sequentially execute the astronomical data processing tasks or automatically execute the astronomical data processing tasks according to a set time.
According to the embodiment of the invention, under the condition that the astronomical data processing program is a passive code or the code is complex and can not be subjected to parallelization modification, the same astronomical data processing program can still be used for simultaneously processing different astronomical data files, and the parallel computing capability is realized.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. An astronomical data parallel processing apparatus comprising:
the computing cluster comprises a management server, a storage server and a plurality of computing servers;
the storage server is used for storing a plurality of astronomical data files, parameter files and astronomical data processing instructions, the instructions are executed by a plurality of computing servers at the same time, and the execution comprises the following steps:
simultaneously extracting parameters from the parameter file; and initiating an astronomical data processing task, and distributing the astronomical data to a plurality of computing servers in parallel for operation.
2. The device of claim 1, wherein the management server, storage server, and compute server are connected by an ethernet network and operate within the same network segment.
3. A method of parallel processing of astronomical data applying the apparatus of claim 1, comprising:
Operating a starting module;
Distributing the parallel module to a plurality of computing servers for operation, and distributing a task number for the astronomical data processing program;
starting an astronomical data processing program, extracting parameters from a parameter file in a storage server according to the task number, and inputting the parameters into the astronomical data processing program;
And processing the astronomical data file in the storage server according to the parameters, and keeping the result.
4. the method of claim 3, wherein the astronomical data processing program runs in the plurality of compute servers simultaneously.
5. The method of claim 4, comprising:
And simultaneously running in the plurality of computing servers through a parallel module, extracting parameters from the parameter file and starting an astronomical data processing program.
6. the method of claim 5, comprising:
And running in the management server through a starting module, initiating an astronomical data processing task and distributing the parallel module to a plurality of computing servers for running.
7. the method of claim 6, wherein the parallelism module complies with the MPI standard specification.
8. The method of claim 7, wherein the parameter file is in a text file format, and each row comprises a task number and an astronomical data file allocated to an astronomical data processing program corresponding to the task number.
CN201910693839.0A 2019-07-29 2019-07-29 Astronomical data parallel processing device and astronomical data parallel processing method Active CN110543361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910693839.0A CN110543361B (en) 2019-07-29 2019-07-29 Astronomical data parallel processing device and astronomical data parallel processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910693839.0A CN110543361B (en) 2019-07-29 2019-07-29 Astronomical data parallel processing device and astronomical data parallel processing method

Publications (2)

Publication Number Publication Date
CN110543361A true CN110543361A (en) 2019-12-06
CN110543361B CN110543361B (en) 2023-06-13

Family

ID=68710387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910693839.0A Active CN110543361B (en) 2019-07-29 2019-07-29 Astronomical data parallel processing device and astronomical data parallel processing method

Country Status (1)

Country Link
CN (1) CN110543361B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198104A1 (en) * 2004-01-29 2005-09-08 Kwon Oh K. System and method for grid MPI job allocation using file-based MPI initialization in grid computing system
US20090172353A1 (en) * 2007-12-28 2009-07-02 Optillel Solutions System and method for architecture-adaptable automatic parallelization of computing code
US8260840B1 (en) * 2010-06-28 2012-09-04 Amazon Technologies, Inc. Dynamic scaling of a cluster of computing nodes used for distributed execution of a program
CN102959517A (en) * 2010-06-10 2013-03-06 Otoy公司 Allocation of gpu resources accross multiple clients
CN103034534A (en) * 2011-09-29 2013-04-10 阿尔斯通电网公司 Electric power system analysis parallel computing method and system based on grid computation
CN104537180A (en) * 2015-01-04 2015-04-22 中国科学院国家天文台南京天文光学技术研究所 Numerical simulation method of astronomical site selection atmospheric optical parameter measurement instrument
CN104615487A (en) * 2015-01-12 2015-05-13 中国科学院计算机网络信息中心 System and method for optimizing parallel tasks
US20160026553A1 (en) * 2014-07-22 2016-01-28 Cray Inc. Computer workload manager
US20160182620A1 (en) * 2014-03-26 2016-06-23 Hitachi, Ltd. Data distribution apparatus, data distribution method, and data distribution program for parallel computing processing system
GB201717138D0 (en) * 2016-11-28 2017-12-06 National Univ Of Defense Technology Spark-based imaging satellite task preprocessing parallelization method
US20170351536A1 (en) * 2016-06-02 2017-12-07 Hewlett Packard Enterprise Development Lp Provide hypervisor manager native api call from api gateway to hypervisor manager
CN108805187A (en) * 2018-05-29 2018-11-13 北京佳格天地科技有限公司 Celestial spectrum sequence automatic classification system and method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198104A1 (en) * 2004-01-29 2005-09-08 Kwon Oh K. System and method for grid MPI job allocation using file-based MPI initialization in grid computing system
US20090172353A1 (en) * 2007-12-28 2009-07-02 Optillel Solutions System and method for architecture-adaptable automatic parallelization of computing code
CN102959517A (en) * 2010-06-10 2013-03-06 Otoy公司 Allocation of gpu resources accross multiple clients
US20140325073A1 (en) * 2010-06-10 2014-10-30 Otoy, Inic Allocation of gpu resources across multiple clients
US8260840B1 (en) * 2010-06-28 2012-09-04 Amazon Technologies, Inc. Dynamic scaling of a cluster of computing nodes used for distributed execution of a program
CN103034534A (en) * 2011-09-29 2013-04-10 阿尔斯通电网公司 Electric power system analysis parallel computing method and system based on grid computation
US20160182620A1 (en) * 2014-03-26 2016-06-23 Hitachi, Ltd. Data distribution apparatus, data distribution method, and data distribution program for parallel computing processing system
US20160026553A1 (en) * 2014-07-22 2016-01-28 Cray Inc. Computer workload manager
CN104537180A (en) * 2015-01-04 2015-04-22 中国科学院国家天文台南京天文光学技术研究所 Numerical simulation method of astronomical site selection atmospheric optical parameter measurement instrument
CN104615487A (en) * 2015-01-12 2015-05-13 中国科学院计算机网络信息中心 System and method for optimizing parallel tasks
US20170351536A1 (en) * 2016-06-02 2017-12-07 Hewlett Packard Enterprise Development Lp Provide hypervisor manager native api call from api gateway to hypervisor manager
GB201717138D0 (en) * 2016-11-28 2017-12-06 National Univ Of Defense Technology Spark-based imaging satellite task preprocessing parallelization method
CN108805187A (en) * 2018-05-29 2018-11-13 北京佳格天地科技有限公司 Celestial spectrum sequence automatic classification system and method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
NAVTEJ SINGH等: "Parallel astronomical data processing with Python: Recipes for multicore machines", 《ASTRONOMY AND COMPUTING 》 *
宋烜等: "用MapReduce框架构建虚拟天文台数据节点", 《天文研究与技术》 *
徐星宇: "多GPU-CPU混合异构平台下的光谱计算优化", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
杨哲睿等: "大规模天文数据分析及多维信息可视化平台的建设和管理", 《科研信息化技术与应用》 *
梁胤程等: "铁路路基状态检测中探地雷达数据并行处理", 《中国铁道科学》 *
郑裕民等: "基于GPU超级计算系统的天文科学应用", 《科研信息化技术与应用》 *

Also Published As

Publication number Publication date
CN110543361B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
Scolati et al. A containerized big data streaming architecture for edge cloud computing on clustered single-board devices
US20160147520A1 (en) Device driver aggregation in operating system deployment
US9986018B2 (en) Method and system for a scheduled map executor
CN105912387A (en) Method and device for dispatching data processing operation
CN103336672B (en) Method for reading data, device and computing device
CN112286917B (en) Data processing method and device, electronic equipment and storage medium
CN107707687A (en) A kind of method and apparatus of virtual machine IP address configuration
US10326824B2 (en) Method and system for iterative pipeline
CN104683472A (en) Data transmission method capable of supporting large data volume
CN110851234A (en) Log processing method and device based on docker container
DE112021005444T5 (en) INTELLIGENT POWER AND COOLANT DISTRIBUTION UNIT FOR COOLING SYSTEMS IN DATA CENTERS
CN107066205B (en) Data storage system
CN109800078B (en) Task processing method, task distribution terminal and task execution terminal
US10698737B2 (en) Interoperable neural network operation scheduler
CN111158875B (en) Multi-module-based multi-task processing method, device and system
CN113326123A (en) Biological information analysis and calculation system and method based on container technology
CN110502337B (en) Optimization system for shuffling stage in Hadoop MapReduce
KR100590764B1 (en) Method for mass data processing through scheduler in multi processor system
CN110543361A (en) Astronomical data parallel processing device and method
CN109766131A (en) The system and method for the intelligent automatic upgrading of software is realized based on multithreading
CN104951346A (en) Process management method for embedded system as well as system
CN115344370A (en) Task scheduling method, device, equipment and storage medium
CN107807608A (en) Data processing method, data handling system and storage medium
CN110955461B (en) Processing method, device, system, server and storage medium for computing task
CN114067917A (en) GATK super computer system based on tuning parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant