CN114661637A

CN114661637A - Data processing system and method for radio astronomical data intensive scientific operation

Info

Publication number: CN114661637A
Application number: CN202210187329.8A
Authority: CN
Inventors: 安涛
Original assignee: Shanghai Astronomical Observatory of CAS
Current assignee: Shanghai Astronomical Observatory of CAS
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-06-24
Anticipated expiration: 2042-02-28
Also published as: CN114661637B

Abstract

The invention provides a data processing system for radio astronomical data intensive scientific operation, which comprises at least one data constellation, wherein each data constellation is an extensible comprehensive data unit which is arranged on one cabinet or adjacent cabinets and consists of an extensible distributed storage system, a mixed heterogeneous computing node system and a network system; each computing node is physically integrated with a local storage unit of a super-capacity memory and a flash memory type of the computing node, and the storage system consists of a local storage unit corresponding to each computing node and a distributed file system consisting of storage nodes; each data constellation has a separate distributed file system. The invention also provides a corresponding method. The data processing system adopts a data constellation architecture for the big data, and each data constellation has an independent shared file system, so that the demands of calculation and storage of astronomical big data are met, and the influence brought by the traditional global file system is greatly reduced.

Description

Data processing system and method for radio astronomical data intensive scientific operation

Technical Field

The invention belongs to the field of radio astronomical data processing technology, data intensive scientific operation, big data and high-performance calculation, and particularly relates to a data processing system and method for radio astronomical data intensive scientific operation.

Background

With the new construction and operation of advanced astronomical observation equipment, the astronomical community faces the challenges of ultra-large scale data and data intensive scientific operations. For example, a Square Kilometer Array (SKA) telescope for global collaboration is the largest astronomical observation device constructed by the international astronomical world plan, and is the largest international collaboration large scientific plan in the astronomy field in which china participates. SKA gathers a large amount of small-bore antennas and realizes synthetic aperture radio interference imaging, the total receiving area of the SKA is up to one square kilometer, the sensitivity of the SKA is improved by 50 times and the speed of the SKA is improved by 10000 times compared with the current maximum radio telescope, and the SKA provides great opportunity for human cognitive universe. After the first stage (2021-2029), the scientific data storage scale of SKA is estimated to be up to 710 petabytes (1PB 1024TB is more than one million GB) per year, an international SKA regional center data processing system for calculating and storing these scientific data needs to have a processing platform with 300PFlops (30 billion floating point operations per second) computing power, namely SKA regional center, wherein at least 20PFlops computing power is used for subsequent scientific analysis, and data exchange between regional center nodes in each country needs to have a stable network speed of 100Gbps (100 gigabit ethernet) on average. By 2029, the total storage volume of data in the center of international SKA area is expected to be as high as 2EB (1EB 1024 PB). The existing supercomputing platforms can not achieve the planning goal, and therefore advanced data processing platforms are being developed by SKA international organization.

The analysis and processing of the ultra-large scale data (PB magnitude) are the common challenges facing the astronomy and computer science interfaces, and the success and failure of the large data-driven SKA telescope also depends on the capability of the regional center to solve the world problem. The processing process of SKA scientific data is a typical data intensive computing task, and the business mode of the SKA scientific data is greatly different from that of traditional super computing based on computation intensive business.

The traditional super computing platform has small local storage, low shared memory capacity, long data calling time consumption and single system architecture, and is not suitable for carrying out pipelined processing on the emerging super-large-scale data. In addition, the traditional supercomputing platform relies heavily on a shared file system on a storage architecture, and a higher system failure rate and even system paralysis can occur on SKA-scale data processing. The global multi-user application scenario of the SKA project will also greatly affect scientific research work of scientific users on the traditional supercomputing platform.

In the era of big data and artificial intelligence, the trend of conversion from computation intensive to data intensive is increasingly obvious. How to rapidly process massive data with complex data structure, diversified data types, multiple dimensions and large size is the core key of data intensive scientific operation represented by astronomical big data.

Disclosure of Invention

The invention aims to provide a data processing system and a data processing method aiming at radio astronomical data intensive scientific operation so as to improve the data processing speed of the data intensive scientific operation.

In order to achieve the above object, the present invention provides a data processing system for radio astronomical data intensive scientific operations, comprising at least one data constellation, each data constellation being an extensible integrated data unit installed on one cabinet or on a plurality of adjacent cabinets, each data constellation being composed of an extensible distributed storage system, a mixed heterogeneous computing node system and a network system; each computing node is physically integrated with a corresponding super-large-capacity memory and a flash memory type local storage unit, and the storage system consists of the local storage unit corresponding to each computing node and a distributed file system consisting of the storage nodes; each data constellation has an independent distributed file system.

The hybrid heterogeneous compute node system includes at least an x86CPU architecture, an ARM architecture, and an x86CPU + GPU architecture.

When the computing node is an ARM architecture, the total access bandwidth is 80 GB/s; when the computing node is a CPU + GPU architecture, the access bandwidth of the computing node is 2 TB/s.

The total memory capacity of the ultra-large capacity memory of each computing node is 1 TB-2 TB, the total memory capacity of the ultra-large capacity memory of each computing node is adjusted correspondingly according to the number of the cores of the CPU, and the memory capacity corresponding to each core is not lower than 32 GB.

For a compute node with 32 cores, its total memory capacity is at least 1 TB.

The local storage unit adopts NVMe SSD, and the storage node adopts HDD.

The distributed file system adopts a fully distributed architecture and a fully symmetric distributed architecture.

The network system comprises a plurality of IB switches connected with all computing nodes and storage nodes, network switches connected with all computing nodes, storage nodes, background storage nodes and management nodes, and a plurality of user login nodes connected with the management nodes through the Internet.

In another aspect, the present invention provides a data processing method for radio astronomical data intensive scientific operations, comprising:

s0: providing a data processing system for radio astronomical data intensive scientific arithmetic as described above;

s1: the original data is sent to the super-large-capacity memory of the current computing node through an IB switch to serve as super-large memory cache;

s2: the current computing node processes the task and judges whether the obtained intermediate data or final data is obtained; if it is intermediate data, the process proceeds to step S3; if the data is the final data, storing the final data to a super-large-capacity memory or a local storage unit of the computing node, writing the final data back to a distributed file system of the storage node through an IB switch for storage, and ending the process;

s3: the current computing node stores the obtained intermediate data into a local storage unit of a super-large-capacity memory or a flash memory type according to storage requirements to serve as a super-large memory cache or a flash memory cache;

s4: and the computing nodes with different architecture types from the current computing node read the super memory Cache or the SSD Cache of the current computing node through the IB switch to perform intermediate data interaction between the computing nodes, and then serve as new current computing nodes to return to the step S2.

The data processing system for the radio astronomical data intensive scientific operation aims at dealing with astronomical big data, provides a data constellation architecture, each data constellation has an independent shared file system, and not only can meet the requirements of calculation and storage of the astronomical big data, but also can greatly reduce the influence brought by a global file system (the traditional supercomputing architecture design). Meanwhile, the design concept can distribute the equipment of processors with different models to different processing tasks, and the local storage and the network are customized and efficiently utilized according to the requirement.

The data processing system for the radio astronomical data intensive scientific operation is designed for the huge data volume of astronomical big data, consists of a hybrid heterogeneous computing node system, a high-performance storage system and a high-speed network system which are physically installed together, adopts a data constellation architecture, changes the independent design schemes of the three systems in the traditional super-operation, can flexibly distribute resources according to the requirements of computing tasks, can be completed by one data constellation or a plurality of data constellations, and meets various application scenes such as a plurality of scientific data processing flows, various user requirements, different computing scales, distributed tasks and the like. (2) The large memory capacity on a single node of the data processing system of the invention solves the problem of processing a single large-size data file, and avoids or reduces the time cost of data cutting, data moving and idle waiting. In addition, the large memory has the capacity of enabling files needing to be read for many times to reside in the memory for a long time and be accessed by a plurality of nodes, so that the time consumption caused by frequently reading and reading the files is greatly reduced, and the data processing flow is accelerated. (3) According to the hybrid heterogeneous computing architecture of the data processing system, disclosed by the invention, by reasonably distributing the computation intensive tasks, the memory intensive tasks and the data intensive tasks in the process to the corresponding computing equipment, the challenges of complex astronomical data processing processes, multiple data files and high parallelism are effectively solved, the efficiency of the whole cluster is improved, and the operation cost is effectively saved. (4) The multi-level hybrid storage system of the data processing system, which comprises the SDD and the HDD, ensures high-performance reading and writing, and can meet the wide application requirements of high-performance calculation, high-data I/O, multi-load tasks and the like. In addition, the distributed storage architecture provides a high throughput, high concurrency, and high scalability storage mechanism, ensuring that performance still approaches linear growth when the data center is scaled up to meet the demands of ever-increasing scientific users. (5) The data processing system of the invention is a high-speed network system which is designed for mixed heterogeneous computing nodes and is up to 200Gbps and a topological structure thereof, is connected with computing, storing and managing equipment, solves the most serious data I/O bottleneck problem in data intensive computing, ensures smooth data exchange between the inside of the nodes and the nodes, reduces the delay of data flow and reduces the risk of system breakdown caused by network flow.

Drawings

FIG. 1 is a system architecture diagram of a data processing system for radio astronomical data intensive scientific arithmetic, according to one embodiment of the present invention.

FIG. 2 is a workflow diagram of a data processing system for radio astronomical data intensive scientific arithmetic according to one embodiment of the present invention.

FIG. 3 is a schematic diagram of a connection of a network system of a data processing system for radio astronomical data intensive scientific arithmetic, according to an embodiment of the present invention.

FIG. 4 is a workflow diagram of a data processing system for radio astronomical data intensive scientific arithmetic according to one embodiment of the present invention.

FIG. 5 is a workflow diagram of a data processing system for radio astronomical data intensive scientific arithmetic according to another embodiment of the present invention.

FIG. 6 is a workflow diagram of a data processing system for radio astronomical data intensive scientific arithmetic according to yet another embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The invention provides a data processing system for intensive scientific operation of radio astronomical data, which is suitable for radio astronomical data. In addition, the system can also be suitable for large data processing in other fields of astronomy and the fields of gene, biology, medicine and the like.

Fig. 1 shows a data processing system for radio astronomical data intensive scientific arithmetic, comprising at least one data constellation according to an embodiment of the present invention. The scale and configuration of a single data constellation can be adjusted according to actual requirements, and a plurality of sets of data constellation communication connections can be combined to form a more powerful cluster.

Each data constellation is a scalable integrated data unit mounted on a cabinet or adjacent cabinets, and is composed of a scalable distributed storage system 100, a hybrid heterogeneous computing node system 200, and an ultra-high speed low latency network system 300.

As shown in fig. 1, the hybrid heterogeneous refers to hybrid and heterogeneous computing architectures, each computing node 200 may adopt one of an x86 architecture, an ARM architecture, and a GPU architecture, or may adopt a combination of a plurality of the x86 architecture, the ARM architecture, and the GPU architecture, and the combination may be freely and flexibly customized to adapt to different types of astronomical data (e.g., continuous spectrum visibility data, time domain data, and spectral line data) and different computing requirements (e.g., data intensive, computation intensive, memory intensive, and the like), and different processing requirements of different processing steps in the same pipeline, and better meet specific tasks (e.g., computation intensive, data intensive, and memory intensive) at different stages in a data processing process, thereby meeting requirements of astronomical data processing in multiple application scenarios. In the embodiment, the hybrid heterogeneous computing node system 200 at least includes three system architectures, namely x86CPU architecture, ARM architecture and x86CPU + GPU architecture (GPU architecture for short). The hybrid heterogeneous computing system adopts x86 and a GPU simultaneously as heterogeneous computing nodes, and the hybrid heterogeneous computing system adopts a hybrid combination of heterogeneous computing nodes of various different architecture types and traditional CPU computing nodes simultaneously.

In this embodiment, the computing nodes of the x86 architecture are composed of intel x86 Gold 6132CPU nodes, each computing node has two CPUs with a main frequency of 2.6GHz, 28 computing cores are provided in total, a theoretical peak value is 18 trillion floating point operations, and each computing node is configured with a 1TB memory and a 4TB flash memory (SSD), which is an ideal choice for a large-scale data imaging task. The ARM architecture computing nodes are composed of 10 Huacheng spread 920 computing nodes, each computing node is provided with 96 kernels, the dominant frequency is 2.6GHz, the theoretical computing peak value is 10 trillion floating point operations, each computing node is provided with a total of 1TB memory and 600GB flash memory, and the ARM architecture computing nodes are suitable for highly parallel processing tasks of a multispectral channel. The computing node of the CPU + GPU architecture is a heterogeneous system composed of an x86CPU + GPU, 36 CPU cores and 8 Nvidia Tesla V100 cards are configured, the theoretical peak value is 62.4 trillion floating point operations, a 1TB memory and a 7.7TB flash memory are used, and the computing node is suitable for digital beam synthesis of time domain data, astronomical search steps in an imaging data process, artificial intelligence related scientific application and the like. When the computing node is an ARM architecture, the total access bandwidth is 80 GB/s; when the computing node is a CPU + GPU architecture, the access bandwidth is 2 TB/s.

Each data constellation is provided with a corresponding distributed storage system 100 which is installed in a cabinet where the data constellation is located and stores the required original file. The distributed storage systems 100 of different data constellations are independent and interconnected by a very high speed, low latency network system 300.

Each computing node is physically integrated with a corresponding super-large-capacity memory and a local storage unit of a flash memory (SSD) type, and the storage system 100 is composed of a local storage unit 101 corresponding to each computing node and a distributed file system 102 composed of each storage node; each data constellation has a separate distributed file system 102. Therefore, the local storage unit and the distributed file system of the storage system 100 cooperate with the super-large-capacity memory on each compute node (the total memory capacity of the super-large-capacity memory of a single compute node is 1 TB-2 TB, and the total memory capacity of the compute node/the number of cores of the compute node is greater than 32GB, for example, the number of cores of the compute node is 32, the total memory capacity of the compute node is required to be greater than 1TB) to exert the data I/O bandwidth performance, so that the data interaction between the compute nodes is effectively utilized, and the data interaction bottleneck of the compute nodes and the storage system 100 is reduced. Data interaction refers to ultra-large bandwidth data streams occurring among data constellations, among multiple compute nodes within a data constellation, and within a single compute node. The local storage unit on the computing node adopts an NVMe flash memory (SSD) which is used for fast data exchange and cache in the computing node. The storage nodes employ conventional HDDs. According to the requirements of different stages on bandwidth and the characteristics of data in the processing process, the distributed file system adopts a completely distributed architecture and a completely symmetrical distributed architecture, and has high-speed read-write performance and large-scale expansibility.

As shown in fig. 3, the network system 300 is composed of two parts, one is a plurality of InfiniteBand network systems (IB switch for short, bandwidth is 100Gbps, bandwidth between partial nodes reaches 200Gbps) connected to all the computing nodes and storage nodes, one is an ethernet (maximum bandwidth is 10Gbps) connected to data centers in other continents of the world, and the ethernet includes network switches connected to all the computing nodes, storage nodes, background storage nodes, and management nodes, and a plurality of user login nodes connected to the management nodes via the internet.

The plurality of computing nodes and the plurality of storage nodes are similar to a single constellation in a constellation, and any two of the plurality of computing nodes and the plurality of storage nodes are connected together through the network system 300.

The workflow of the data processing system for radio astronomical data intensive scientific arithmetic of the present invention and the advantages of the data processing system compared to a conventional supercomputer are specifically described below.

In a conventional supercomputer, three systems, i.e., a computing node, a storage system, and a network system, are physically separated, for example, a row of racks is a computing node, a row of racks is a storage system, and a row of racks is a network system. This design is convenient for the computation intensive traffic model, but in the data intensive traffic, it becomes very inefficient because the amount of data is very large, the data movement becomes the biggest bottleneck, and when the data is not in place, the compute nodes are in idle wait state, causing a great waste of running cost.

The invention adopts a data constellation, which changes the design idea that the three systems are separated physically in the traditional hypercalculation. The idea behind data constellation design is to integrate these three systems together organically, bringing the computation closer to the data, minimizing the cost of data movement. The resources can be flexibly distributed according to the requirements of the computing tasks and can be completed by one data constellation or a plurality of data constellations, so that multiple application scenes such as multiple scientific data processing flows, multiple types of user requirements, different computing scales, distributed tasks and the like can be met. According to the design of the invention, most data processing can be completed in one data constellation, so that a large amount of data exchange between an independent computing node and an independent external storage node in a traditional supercomputing system is avoided, the energy consumption is greatly reduced (the data operation cost of about 1/3-1/2 is saved compared with that of a traditional supercomputing platform), a part of network equipment is saved, and the overall construction cost of a data center is saved.

The key program in the radio astronomical data processing process needs to call data in the memory for read-write operation, so the capacity of the memory determines the performance of the data processing flow. Data read into the system includes, but is not limited to, raw data from SKA telescope observations, which is typically in the standard astronomical fits (flexible Image Transport systems) format or ms (measurement sets) format, although other astronomical data formats may be used. A single piece of raw data from the SKA leader telescope typically has a size of tens of gigabytes (1 GB-1024 MB) or even tens of TB bytes (1 TB-1024 GB).

A computing node of a traditional computing-based super computing center generally has only 64GB or 128GB of memory, and obviously, a computing system cannot read in observation data at one time. The single-core serial processing procedure or mode in the radio astronomy field is not suitable for processing a large amount of observation data. Especially raw data with a plurality of frequency channels of 65000 at most like from an SKA telescope, the data of each channel are independently processed and combined to be written into a file, and then the visibility data are converted into images through inverse Fourier transformation. In this process, a large amount of data interaction is performed in the memory. In conventional supercomputing, data processing of a single compute node typically uses shared memory, while data processing of multiple compute nodes uses distributed memory. However, the distributed memory needs to be read in a limited way at one time, and when the size of a single data file exceeds a certain scale, calculation cannot be completed at one time, and in actual operation, astronomers adopt a data cutting scheme; that is, data is first cut into several blocks in chronological order, and then each block of data is read in and processed in turn. Experience has shown that the process of segmenting data and loading the segmented data into memory consumes a significant amount of runtime. The smaller the number of slices, the lower the overall time consumption. In addition, the problem cannot be solved effectively by using a distributed memory and deploying computing in multiple computing nodes for parallel processing, because the computing task is distributed over multiple computing nodes, which means that data needs to be transmitted or copied to multiple computing nodes, and thus, it takes additional time and more time to move data on different computing nodes. In summary, conventional computing platforms applied to compute-intensive application scenarios sacrifice access time to achieve a cost and benefit balance, which can be quite time-consuming and labor-intensive for data-intensive application scenarios in astronomy.

The invention designs a memory scheme aiming at the characteristics of data intensive scientific operation such as astronomical observation data and the like. Specifically, in the present invention, a large capacity of memory is allocated to each compute node in each data constellation, so that processing of one data file is completed as once as possible, or the number of data slices is reduced as much as possible. In the specific implementation of the scheme, the total memory capacity of the ultra-large capacity memory of a single computing node is about 1TB to 2TB, so that the radio astronomical data with the size of about 100GB can be processed at one time at one computing node, and data files with the size of hundreds of GB also only need to be cut into a plurality of blocks.

The memory scheme of the present invention is not only embodied in that the total memory capacity of the compute nodes is large, but also embodied in that the allocable memory amount of a single core of each compute node (i.e. x86/ARM/GPU processor) is large, that is, "the total memory capacity of the compute nodes/the number of cores of the compute nodes" is greater than 32GB, for example, the number of cores of the nodes is 32, and the total memory capacity is required to be greater than 1 TB. Because the large data processing observed by modern telescopes is basically accelerated by parallel processing, each thread in the parallelization process is controlled by the kernel of an x86/ARM/GPU processor, and the data processing task amount which can be executed by the thread is limited by the allocated memory amount. Traditional super computing platforms, which use computing as a main service, seek to have as many computing cores as possible without allocating too much memory due to the small amount of data, so that multiple cores of a computing node usually use shared memory, and the standard configuration is 64GB or 128GB, and up to 44 (e.g., intel to strong Gold 6152) or 68 (e.g., intel to strong Phi 7250) or even 96 cores (hua shiyangceng 920) of a computing node, and each core is allocated to only 1.88GB of memory on average. However, as described above, when running data intensive tasks, it is not possible to slice the data too thin (otherwise the overall time for data processing would increase), which only sacrifices the advantages of multiple compute cores. For example, a 100GB file is divided into 5 parts in time series and allocated to a compute node of a conventional supercomputing, and only 20 of 68 cores (1GB of memory per thread) in the compute node can be used. In other words, although a compute node has 68 cores, in practice, more than half of the cores are idle and do not efficiently utilize the performance of the system.

In order to solve the problem, the invention adopts a design scheme which considers the total memory capacity of the computing node and the single-core average memory of the computing node. According to one embodiment of the invention, intel 6132 is used as at least one compute node, wherein 28 compute cores have 1TB memory in total, and each core has 36.6GB memory on average. For a 100GB file, there is no need to cut at all, and it can use all 28 compute kernels (input data only accounts for 10% of the memory allocated to each kernel), perfectly solving the parallel acceleration of the data flow.

In a word, the large memory capacity on a single computing node solves the problem of processing a single large-size data file, avoids or reduces the time cost of data cutting, data moving and idle waiting, and greatly improves the operating efficiency of a data constellation.

One key limiting factor of data-intensive computing processes is data I/O limitations, which are fundamentally different from compute-intensive High Performance Computing (HPC) in that the former is adapted to the storage, management, acquisition, and processing of large data, with most of the processing time being spent on I/O and the movement and replication of data. Parallel processing of data-intensive computational tasks typically employs units of data that are diced into small blocks, each unit independently performing the same application, requiring the entire data processing system to be specially designed so that the degree of parallelism can be extended as the amount of data increases.

FIG. 2 is a workflow diagram of a data processing system for radio astronomical data intensive scientific arithmetic according to one embodiment of the present invention. As shown in fig. 2, the data processing method for the radio astronomical data intensive scientific operation includes:

step S0: providing a data processing system as described above for radio astronomical data intensive scientific arithmetic;

step S1: sending the original data into a super-large-capacity memory of the current computing node through an IB switch to serve as a super-large memory Cache (Cache);

step S2: the current computing node processes the task and judges whether the obtained intermediate data or final data is obtained; if it is intermediate data, the process proceeds to step S3; if the data is the final data, storing the final data to a super-large-capacity memory of the computing node or a local storage unit of a flash memory type, writing back to a distributed file system of the storage node through an IB switch for storage, and ending the process;

step S3: the current computing node stores the obtained intermediate data into a local storage unit of a super-large-capacity memory or a flash memory type according to storage requirements to serve as a super-large memory Cache (Cache) or a flash memory Cache (SSD Cache);

according to different requirements of different stages of a data processing assembly line on data I/O, a storage system adopts a super-large-capacity memory and a flash memory type local storage unit as a multi-stage mixed medium for storage, has the functions of safety, high-speed reading, quick recoverable reconstruction and the like, can realize storage, management, calling and multiplexing of data in a full life cycle, and can meet various application requirements of high-performance calculation, high-data I/O, multi-load tasks and the like.

Step S4: and the computing nodes with different architecture types from the current computing node read the super memory Cache or the SSD Cache of the current computing node through the IB switch to perform intermediate data interaction between the computing nodes, and then serve as new current computing nodes to return to the step S2.

In the data processing system, the I/O constraint is solved by jointly optimizing and designing the ultra-large-capacity memory of the computing node, the high-performance storage system and the high-speed low-delay network system.

1) Firstly, the super-large capacity memory with the memory capacity of 1 TB-2 TB is the most frequently accessed and interacted component, on this basis, the access bandwidth index is preferentially considered when the computing node selects the type, taking the computing node of ARM architecture type as an example, 8-channel DDR4 and a 100G Ethernet card supporting PCIe are integrated, and the total access bandwidth can reach 80 GB/s. Taking a computing node of a CPU + GPU architecture type as an example, the access bandwidth can be expanded to 2TB/s, so that through the designed ultra-large capacity memory and high-performance (namely high access bandwidth) processor scheme, compared with a commercial server which is not optimally designed according to data I/O constraints, the memory bandwidth is increased by 46%, and the total I/O bandwidth is increased by 66%.

2) Secondly, the storage system adopts high-performance distributed storage, uses a mixed storage medium of SSD of a local storage unit of a computing node and HDD of a storage node, takes high performance and cost performance into consideration, and constructs a shared distributed Cache (Cache) resource pool for all business systems to commonly use the distributed Cache (Cache) resource pool; and by a layered read Cache mechanism (the first layer is a memory Cache, and the second layer is an SSD Cache), the data access time is shortened, and the average time delay for reading and writing common 4K data is kept about 1 ms. According to the invention, the DHT (Distributed Hash Table) algorithm is combined with the fast compatibility of high-performance hardware (in the case of full NVMe SSD configuration), the data read-write time is shortened, the maximum number of concurrent users is increased from 400 to 1000, and the wide application requirements of high-performance calculation, high-data I/O (input/output) and multi-load tasks are completely met.

3) Finally, as shown in fig. 3, the invention designs a high-speed network system up to 200Gbps for a hybrid heterogeneous compute node system and a distributed storage system and a topology structure thereof, and connects compute nodes, storage nodes and management equipment by using a high-throughput and low-latency IB switch as an internal networking, thereby solving the most serious data transmission bottleneck problem in data intensive computation, ensuring smooth data exchange between the interior of the compute nodes and the nodes, reducing the delay of data flow, and reducing the risk of system breakdown caused by network traffic. Inside the computing nodes, the computers are connected through interfaces, for example, an M.2 interface is adopted for NVMe storage. The internal data exchange bandwidth of the system node is restrained by the bandwidth of the NVMe SSD Cache, and the I/O performance is further greatly improved. At present, the maximum I/O throughput rate of a single node is measured to be 7.4 gigabytes/second, the theoretical peak value is 8.5 gigabytes/second, the I/O utilization rate is as high as 94%, and the I/O utilization rate exceeds the conventional standard of a high-performance computing server. In addition, the whole system is directly connected with other SKA data centers all over the world through an intercontinental Ethernet (up to 10Gbps) at the highest level in the field of Chinese scientific research, and supports the transmission, calculation and management of astronomical big data to the maximum extent.

The core difference between the storage system 100 of the present invention and conventional SAN fabric storage is the expansion capability. Traditional SAN storage is extended in a controller stack manner: double or multiple controllers are stacked, up to dozens of controllers, and capacity and performance are improved by vertically stacking a disk rack (Scale-up) at the rear end of the controller, but a bottleneck is encountered after a certain Scale is reached: adding a hard disk can increase the total capacity, but is limited by the controller architecture, and the performance cannot be increased linearly any more. In the invention, the distributed file system of the storage system 100 adopts a completely distributed architecture, supports 288 storage nodes (hundreds of PB storage capacity), and has high-speed read-write performance which is linearly increased along with the increase of the number of the nodes, thereby completely meeting the ever-increasing data requirement of the radio astronomical telescope (such as a square kilometer array). The distributed storage architecture provides a high-throughput, highly concurrent and highly scalable storage mechanism that ensures that storage performance grows nearly linearly with scale-up. In addition, the storage system with the distributed architecture can ensure high expandability and the safety and reliability of data. Since the data processing system records various astronomical data all the time, the astronomical data not only needs to be processed in time, but also needs to be stored in the file system for continuous multiplexing and data mining within a period of time, so that the storage equipment has the advantages of ensuring excellent performance, having extremely high reliability, being accessible all the day and having certain fault tolerance. The distributed file system of the storage system also adopts a full-symmetric distributed architecture, not only can provide a single file system with ultra-large capacity, but also provides a mechanism for redundant backup and rapid data reconstruction, and reliability is provided to the greatest extent. When data is stored, the same file data can be scattered and sliced to be distributed on different hard disks of different storage nodes, and one or more redundant data can be generated for the sliced data to ensure the reliability of the data. The user can flexibly configure the redundancy according to the importance, the performance and the like of the data. Under the condition of failure, the system can automatically use the redundant backup data to reconstruct the data, the speed of reconstructing 2TB data in 1 hour is achieved, and the use of a user is not influenced at all.

The working flow of the data processing system for radio astronomical data intensive scientific operation, namely the processing flow of SKA pilot telescope continuous spectrum patrolling GLEAM-X project data, according to one embodiment of the invention is shown in fig. 4, and the data processing flow uses the three architecture types of computing nodes to process data according to requirements. The total data volume of the project is large (2PB), and the file transfer operation accounts for a large proportion of the processing time (40%). Each full-day scan observation takes 100GB of storage space, including visibility data, images, and metadata, when processing data. In view of the richness of the GLEAM-X tour, the memory Cache required by data processing is up to 3TB, and the data processing operation is directly performed on the memory, so that the data is prevented from being frequently moved.

Specifically, in step S1, the original data is sent to the super-large-capacity memories of the compute nodes of the multiple ARM architectures through the IB switch, so as to serve as a super-large-capacity memory Cache (Cache); in step S2 and step S3, the current compute node completes the compute-intensive task in the flow by using the ultra-large memory Cache, and stores the obtained intermediate data in a local storage unit of a flash memory type to serve as a flash memory Cache (SSD Cache); in step S4, the multiple x 86-architecture compute nodes of different architecture types from the current compute node read the SSD Cache of the current compute node through the IB switch to perform intermediate data interaction between the compute nodes, and then serve as new current compute nodes, return to step S2 to complete data-intensive and memory-intensive tasks, and store the obtained intermediate data in the super-large-capacity memory to serve as super-large-capacity memory Cache; and finally, reading the super-large memory cache of the current computing node by the computing node of 1 or a plurality of CPU + GPU architectures with different architecture types from the current computing node through an IB switch to perform intermediate data interaction between the computing nodes, then using the computing node as a new current computing node to complete a super-large computing intensive task to obtain final data, storing the final data to a flash memory type local storage unit of the computing node, and writing the final data back to a distributed file system of the storage node through the IB switch to store the final data. When the number of the computing nodes of the ARM architecture is 10, the number of the computing nodes of the x86 architecture is 23, and the number of the computing nodes of the CPU + GPU architecture is 2, the performance of the hybrid heterogeneous computing node system is improved by 2.23 times compared with that of a traditional platform in the aspect of verifying that the hybrid heterogeneous computing node system images massive continuous spectrum data generated by the SKA pilot telescope after the prototype platform is tested.

Fig. 5 shows a work flow of a data processing system for radio astronomical data intensive scientific operation according to another embodiment of the present invention, namely, an SKA pilot telescope spectral line data imaging flow, which mainly uses 7 computing nodes of x86 architecture to process and image data of 7 frequency channels in parallel, which is a typical distributed processing task in astronomical scientific operation and is also a data intensive computing task. Specifically, in step S1, the original data is sent to the super-large-capacity memories of the 7 computing nodes of the x86 architecture through the IB switch, so as to serve as a super-large-capacity memory Cache (Cache); subsequently, the current computing node completes the compute-intensive task in the process by using the super-large memory Cache, the obtained final data is stored in a local storage unit of a flash memory type to be used as a flash memory Cache (SSD Cache), and the data is written back to a distributed file system of the storage node through an IB switch for storage.

Fig. 6 shows a workflow of a data processing system for radio astronomical data intensive scientific arithmetic, namely the MWA pulsar search pipeline of the SKA pilot telescope, according to yet another embodiment of the present invention. The MWA (Australian Murchison wide-field radio telescope array) pulsar search project is a typical astronomical time domain data processing flow, the original data is small (TB level), the intermediate data is large, the number of files is large (PB level, million files) and the data does not need to be stored (the data is too large, the storage consumption is too much), and a serious I/O bottleneck can be generated if the data interaction is carried out through distributed storage.

The work flow of the data processing system aiming at the radio astronomical data intensive scientific operation comprises the following steps: sending the original data into a super-large-capacity memory of a plurality of computing nodes of a CPU + GPU architecture through an IB switch to serve as a super-large memory Cache (Cache); the current computing node utilizes an ultra-large memory cache to perform digital beam synthesis (digital beam synthesis is an ultra-large computation-intensive task) on original data, the generated intermediate data are massive beam files, each node generates directories with about 1-2 thousand addresses, each directory contains about two hundred files, and the capacity of each directory is about 300 GB. Caching the intermediate data through a local storage unit in the current computing node to obtain SSDCache; the method comprises the steps that a plurality of directories of an SSD (solid State disk) Cache of a current computing node are read in parallel by the computing nodes of a plurality of ARM architectures which are different from the current computing node through an IB (information base) switch, so that intermediate data interaction among the computing nodes is realized, parallel searching of the directories is carried out to complete a calculation intensive task, final data obtained through searching are used as a memory Cache of the ARM computing node, the final data size is small (TB magnitude), and therefore the data are written back to a distributed file system of a storage node through the IB switch to be stored.

Therefore, the hybrid heterogeneous computing platform is respectively suitable for different scientific application scenes and different steps in a single flow, is not only a radio astronomy but also a scheme worth reference in other scientific operation fields, and also accumulates operation experience for expanding data centers in the radio astronomy field in a large scale in the future.

The above embodiments are merely preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and various changes may be made in the above embodiments of the present invention. All simple and equivalent changes and modifications made according to the claims and the content of the specification of the present application fall within the scope of the claims of the present patent application. The invention has not been described in detail in order to avoid obscuring the invention.

Claims

1. A data processing system aiming at radio astronomical data intensive scientific operation is characterized by comprising at least one data constellation, wherein each data constellation is an extensible comprehensive data unit which is arranged on one cabinet or a plurality of adjacent cabinets and consists of an extensible distributed storage system, a mixed heterogeneous computing node system and a network system;

each computing node is physically integrated with a corresponding super-large-capacity memory and a flash memory type local storage unit, and the storage system consists of the local storage unit corresponding to each computing node and a distributed file system consisting of the storage nodes; each data constellation has an independent distributed file system.

2. The data processing system for radio astronomical data intensive scientific operations of claim 1, wherein the hybrid heterogeneous compute node system comprises at least compute nodes of the x86CPU architecture, the ARM architecture and the x86CPU + GPU architecture.

3. The data processing system for radio astronomical data intensive scientific arithmetic of claim 2, wherein when the compute nodes are ARM architecture, their total access bandwidth is 80 GB/s; when the computing node is a CPU + GPU structure, the access bandwidth of the computing node is 2 TB/s.

4. The data processing system for the radio astronomical data intensive scientific arithmetic of claim 1, wherein the total memory capacity of the ultra-large memory of each compute node is 1TB to 2TB, the total memory capacity of the ultra-large memory of each compute node is adjusted accordingly according to the number of cores of the CPU, and the memory capacity corresponding to each core is not lower than 32 GB.

5. The data processing system for radio astronomical data intensive scientific arithmetic of claim 4, wherein the total memory capacity for a compute node with 32 cores is at least 1 TB.

6. The data processing system for radio astronomical data intensive scientific arithmetic of claim 1, wherein said local storage unit employs an NVMe SSD and said storage nodes employ HDDs.

7. The data processing system for radio astronomical data intensive scientific operations according to claim 1, wherein said distributed file system employs a fully distributed architecture and a fully symmetric distributed architecture.

8. The data processing system for radio astronomical data intensive scientific arithmetic of claim 1, wherein the network system comprises a plurality of IB switches connected to all the compute nodes and storage nodes, a network switch connected to all the compute nodes, storage nodes, background storage nodes and management nodes, and a plurality of user login nodes connected to the management nodes through the internet.

9. A data processing method for radio astronomical data intensive scientific arithmetic is characterized by comprising the following steps:

step S0: providing a data processing system for radio astronomical data intensive scientific arithmetic according to one of the claims 1 to 8;

step S1: the original data is sent to the super-large-capacity memory of the current computing node through an IB switch to serve as super-large memory cache;

step S2: the current computing node processes the task and judges whether the obtained intermediate data or final data is obtained; if it is intermediate data, the process proceeds to step S3; if the data is the final data, storing the final data to a super-large-capacity memory or a local storage unit of the computing node, writing the final data back to a distributed file system of the storage node through an IB switch for storage, and ending the process;

step S3: the current computing node stores the obtained intermediate data into a local storage unit of a super-large-capacity memory or a flash memory type according to storage requirements to serve as a super-large memory cache or a flash memory cache;