CN117131000A - NetCDF meteorological data processing method and terminal - Google Patents

NetCDF meteorological data processing method and terminal Download PDF

Info

Publication number
CN117131000A
CN117131000A CN202311370337.7A CN202311370337A CN117131000A CN 117131000 A CN117131000 A CN 117131000A CN 202311370337 A CN202311370337 A CN 202311370337A CN 117131000 A CN117131000 A CN 117131000A
Authority
CN
China
Prior art keywords
data
computing
netcdf
node
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311370337.7A
Other languages
Chinese (zh)
Other versions
CN117131000B (en
Inventor
吴弘毅
单森华
洪水洁
徐能通
林永清
黄凯悦
陈新伟
梁培栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Istrong Technology Co ltd
Original Assignee
Istrong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Istrong Technology Co ltd filed Critical Istrong Technology Co ltd
Priority to CN202311370337.7A priority Critical patent/CN117131000B/en
Publication of CN117131000A publication Critical patent/CN117131000A/en
Application granted granted Critical
Publication of CN117131000B publication Critical patent/CN117131000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a NetCDF meteorological data processing method and a terminal, wherein a plurality of computing nodes analyze NetCDF meteorological data distributed by a scheduling node, the obtained relational spatio-temporal data are stored in a local cache corresponding to the computing nodes, the local cache asynchronously updates the relational spatio-temporal data to a data arrangement layer, the data arrangement layer stores the relational spatio-temporal data to a distributed storage node, when the scheduling node receives computing tasks corresponding to the relational spatio-temporal data, the computing tasks are distributed to different computing nodes according to the number of available computing nodes, a target computing node distributed to the tasks reads the data to be computed from the distributed storage node through the local cache corresponding to the computing nodes, and the data to be computed in the local cache is pulled into a self memory, so that the processing speed of the large-scale NetCDF meteorological data is improved, and a unified access position is provided by using the data arrangement layer, and the meteorological data processing efficiency is improved.

Description

NetCDF meteorological data processing method and terminal
Technical Field
The invention relates to the technical field of meteorological data management, in particular to a NetCDF meteorological data processing method and a terminal.
Background
The prior art has implemented automatic classification, metadata identification, data parsing, space-time dimension alignment and data integration of NetCDF (Network Common Data Format, network generic data format) files, but has not involved the handling of large-scale meteorological data. In the case where the original data amount exceeds 10T, not all data is downloaded by a single hard disk, and multi-process parallel processing is required to increase the data integration speed, so that the conventional meteorological data processing scheme needs to be improved.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the NetCDF meteorological data processing method and terminal can improve meteorological data processing efficiency.
In order to solve the technical problems, the invention adopts the following technical scheme:
a NetCDF meteorological data processing method comprises the following steps:
analyzing the NetCDF meteorological data distributed by the dispatching nodes by a plurality of computing nodes to obtain relational space-time data, and storing the relational space-time data into a local cache corresponding to the computing nodes;
the local cache asynchronously updates the relational spatiotemporal data to a data arrangement layer, and the data arrangement layer stores the relational spatiotemporal data to a distributed storage node;
The scheduling node receives calculation tasks corresponding to the relational space-time data, and distributes the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain target calculation nodes;
and the target computing node reads the data to be computed corresponding to the computing task from the distributed storage node through the local cache corresponding to the target computing node, and pulls the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained.
In order to solve the technical problems, the invention adopts another technical scheme that:
a NetCDF weather data processing terminal, comprising:
the system comprises a plurality of computing nodes, a plurality of scheduling nodes and a plurality of scheduling nodes, wherein the computing nodes are used for analyzing the NetCDF meteorological data distributed by the scheduling nodes to obtain relational space-time data, and storing the relational space-time data into local caches corresponding to the computing nodes; the target computing node allocated to the task is further used for reading data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, and pulling the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained;
The local cache is used for asynchronously updating the space-time data of the relation type to a data arrangement layer and reading data to be calculated corresponding to the calculation task from the distributed storage node;
the data arrangement layer is used for storing the relational space-time data to the distributed storage nodes;
the scheduling node is used for receiving the calculation tasks corresponding to the relational space-time data, and distributing the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain a target calculation node;
and the distributed storage node is used for storing the space-time data of the relation type.
The invention has the beneficial effects that: the method comprises the steps that a plurality of computing nodes analyze the NetCDF meteorological data distributed by a scheduling node, the obtained relational space-time data are stored in a local cache corresponding to the computing nodes, the local cache asynchronously updates the relational space-time data to a data arrangement layer, the data arrangement layer stores the relational space-time data to a distributed storage node, when the scheduling node receives computing tasks corresponding to the relational space-time data, the computing tasks are distributed to different computing nodes according to the number of available computing nodes, a target computing node distributed to the tasks reads the data to be computed from the distributed storage node through the local cache corresponding to the computing nodes, and the data to be computed in the local cache is pulled to a self memory for computation, so that the multiple computing nodes are used for carrying out parallel processing on the NetCDF meteorological data, the processing speed of the large-scale NetCDF meteorological data is improved, the data arrangement layer is used as an intermediate layer between the computing nodes and the distributed storage nodes, and a unified access position is provided, and therefore the meteorological data processing efficiency is improved.
Drawings
FIG. 1 is a flow chart of steps of a NetCDF weather data processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a NetCDF weather data processing terminal according to an embodiment of the present invention.
Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
Referring to fig. 1, a NetCDF meteorological data processing method includes the steps of:
analyzing the NetCDF meteorological data distributed by the dispatching nodes by a plurality of computing nodes to obtain relational space-time data, and storing the relational space-time data into a local cache corresponding to the computing nodes;
the local cache asynchronously updates the relational spatiotemporal data to a data arrangement layer, and the data arrangement layer stores the relational spatiotemporal data to a distributed storage node;
the scheduling node receives calculation tasks corresponding to the relational space-time data, and distributes the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain target calculation nodes;
and the target computing node reads the data to be computed corresponding to the computing task from the distributed storage node through the local cache corresponding to the target computing node, and pulls the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained.
From the above description, the beneficial effects of the invention are as follows: the method comprises the steps that a plurality of computing nodes analyze the NetCDF meteorological data distributed by a scheduling node, the obtained relational space-time data are stored in a local cache corresponding to the computing nodes, the local cache asynchronously updates the relational space-time data to a data arrangement layer, the data arrangement layer stores the relational space-time data to a distributed storage node, when the scheduling node receives computing tasks corresponding to the relational space-time data, the computing tasks are distributed to different computing nodes according to the number of available computing nodes, a target computing node distributed to the tasks reads the data to be computed from the distributed storage node through the local cache corresponding to the computing nodes, and the data to be computed in the local cache is pulled to a self memory for computation, so that the multiple computing nodes are used for carrying out parallel processing on the NetCDF meteorological data, the processing speed of the large-scale NetCDF meteorological data is improved, the data arrangement layer is used as an intermediate layer between the computing nodes and the distributed storage nodes, and a unified access position is provided, and therefore the meteorological data processing efficiency is improved.
Further, before analyzing the NetCDF meteorological data distributed by the scheduling node by the plurality of computing nodes to obtain the relational space-time data, the method comprises the following steps:
The scheduling node collects NetCDF meteorological data and monitors the available states of a plurality of computing nodes;
the dispatching node determines the number of available computing nodes according to the available states, and calculates hash values of file names of the NetCDF meteorological data by using a hash function according to the number of the available computing nodes;
and the scheduling node groups the NetCDF meteorological data according to the hash value to obtain grouped NetCDF meteorological data, and distributes the grouped NetCDF meteorological data to a plurality of available computing nodes.
As can be seen from the above description, the scheduling node calculates the hash value of the file name of the NetCDF weather data by using the hash function according to the number of the available computing nodes by monitoring the available states of the plurality of computing nodes, groups the NetCDF weather data according to the hash value, and distributes the grouped NetCDF weather data to the available computing nodes.
Further, the plurality of computing nodes analyze the NetCDF meteorological data distributed by the scheduling node, and the obtaining of the space-time data of the relation type includes:
The plurality of computing nodes judge whether the NetCDF meteorological data distributed by the dispatching nodes accords with the preset grid precision, if not, the NetCDF meteorological data are processed by using an interpolation algorithm, and the NetCDF meteorological data which accord with the preset grid precision are obtained;
meshing division is carried out on the NetCDF meteorological data conforming to the preset mesh precision according to the longitude and latitude to obtain a plurality of meshes, wherein each mesh contains time sequence data corresponding to the NetCDF meteorological data conforming to the preset mesh precision;
and processing the time sequence data of each grid by using an interpolation algorithm to obtain the time-space data of the relation type.
As can be seen from the description, the plurality of computing nodes align the space dimension and the time dimension of the NetCDF meteorological data to obtain the relational space-time data, so that the NetCDF meteorological data is effectively processed, and the subsequent storage analysis is convenient.
Further, the allocating the computing tasks to different computing nodes according to the number of available computing nodes, and obtaining the target computing node includes:
the dispatching node analyzes the calculation tasks according to task dimensions to obtain analyzed calculation tasks, and issues the analyzed calculation tasks to the calculation nodes according to the order of the granularity of the task dimensions from small to large;
For each issued calculation task after analysis, the scheduling node performs modulo according to the number of available calculation nodes to obtain a modulo value;
matching the modulus value with the sequence number of the available computing node to obtain a matching result;
and distributing the analyzed calculation tasks to the available calculation nodes according to the matching result to obtain a target calculation node.
According to the description, the scheduling node issues the analyzed calculation tasks to the calculation nodes according to the order of the granularity of the task dimension from small to large, so that the calculation of the next step can directly utilize the calculation result of the previous step, the processing efficiency is improved, the calculation tasks are distributed according to the matching result of the modulus value and the serial number of the available calculation nodes, the calculation tasks can be uniformly distributed on each calculation node, and the calculation tasks can be ensured to be completed more quickly.
Further, the target computing node reads data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, pulls the data to be computed in the local cache to a self memory for computing, and the computing result comprises:
The target computing node sends a data reading request to the data arrangement layer, wherein the data reading request comprises data to be computed corresponding to the computing task;
the data arrangement layer reads the data to be calculated from the distributed storage node to a local cache corresponding to the target calculation node according to the data reading request;
the target computing node pulls the data to be computed from the local cache to a self memory, and computes the data to be computed to obtain a computing result;
the target computing node reads data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, pulls the data to be computed in the local cache to a memory of the target computing node for computing, and after a computing result is obtained, the target computing node comprises:
the target computing node stores the computing result to the local cache and sends computing completion information to the data arrangement layer;
the data arrangement layer acquires the calculation result from the local cache according to the calculation completion information, and determines distributed storage nodes to be written with data;
The data arrangement layer converts the calculation result according to the storage form of the distributed storage node of the data to be written, obtains a converted calculation result, and asynchronously writes the converted calculation result into the distributed storage node of the data to be written.
It can be seen from the above description that after the calculation task is completed, the calculation node stores the calculation result in the local cache, then the data arrangement layer obtains the calculation result from the local cache, converts the calculation result according to the storage form of the distributed storage node, asynchronously writes the converted calculation result into the distributed storage node, and can adapt to the distributed storage node with multiple storage forms.
Referring to fig. 2, another embodiment of the present invention provides a NetCDF weather data processing terminal, including:
the system comprises a plurality of computing nodes, a plurality of scheduling nodes and a plurality of scheduling nodes, wherein the computing nodes are used for analyzing the NetCDF meteorological data distributed by the scheduling nodes to obtain relational space-time data, and storing the relational space-time data into local caches corresponding to the computing nodes; the target computing node allocated to the task is further used for reading data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, and pulling the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained;
The local cache is used for asynchronously updating the space-time data of the relation type to a data arrangement layer and reading data to be calculated corresponding to the calculation task from the distributed storage node;
the data arrangement layer is used for storing the relational space-time data to the distributed storage nodes;
the scheduling node is used for receiving the calculation tasks corresponding to the relational space-time data, and distributing the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain a target calculation node;
and the distributed storage node is used for storing the space-time data of the relation type.
From the above description, the beneficial effects of the invention are as follows: the method comprises the steps that a plurality of computing nodes analyze the NetCDF meteorological data distributed by a scheduling node, the obtained relational space-time data are stored in a local cache corresponding to the computing nodes, the local cache asynchronously updates the relational space-time data to a data arrangement layer, the data arrangement layer stores the relational space-time data to a distributed storage node, when the scheduling node receives computing tasks corresponding to the relational space-time data, the computing tasks are distributed to different computing nodes according to the number of available computing nodes, a target computing node distributed to the tasks reads the data to be computed from the distributed storage node through the local cache corresponding to the computing nodes, and the data to be computed in the local cache is pulled to a self memory for computation, so that the multiple computing nodes are used for carrying out parallel processing on the NetCDF meteorological data, the processing speed of the large-scale NetCDF meteorological data is improved, the data arrangement layer is used as an intermediate layer between the computing nodes and the distributed storage nodes, and a unified access position is provided, and therefore the meteorological data processing efficiency is improved.
Further, the scheduling node is further configured to:
collecting NetCDF meteorological data and monitoring the available states of a plurality of computing nodes;
determining the number of available computing nodes according to the available states, and calculating hash values of file names of the NetCDF meteorological data by using a hash function according to the number of the available computing nodes;
and grouping the NetCDF meteorological data according to the hash value to obtain grouped NetCDF meteorological data, and distributing the grouped NetCDF meteorological data to a plurality of available computing nodes.
As can be seen from the above description, the scheduling node calculates the hash value of the file name of the NetCDF weather data by using the hash function according to the number of the available computing nodes by monitoring the available states of the plurality of computing nodes, groups the NetCDF weather data according to the hash value, and distributes the grouped NetCDF weather data to the available computing nodes.
Further, the plurality of computing nodes are specifically configured to:
judging whether the NetCDF meteorological data distributed by the dispatching nodes accords with the preset grid precision, if not, processing the NetCDF meteorological data by using an interpolation algorithm to obtain the NetCDF meteorological data which accords with the preset grid precision;
Meshing division is carried out on the NetCDF meteorological data conforming to the preset mesh precision according to the longitude and latitude to obtain a plurality of meshes, wherein each mesh contains time sequence data corresponding to the NetCDF meteorological data conforming to the preset mesh precision;
and processing the time sequence data of each grid by using an interpolation algorithm to obtain the time-space data of the relation type.
As can be seen from the description, the plurality of computing nodes align the space dimension and the time dimension of the NetCDF meteorological data to obtain the relational space-time data, so that the NetCDF meteorological data is effectively processed, and the subsequent storage analysis is convenient.
Further, the scheduling node is specifically configured to:
analyzing the computing task according to the task dimension to obtain an analyzed computing task, and issuing the analyzed computing task to the computing node according to the order of the granularity of the task dimension from small to large;
for each issued calculation task after analysis, the scheduling node performs modulo according to the number of available calculation nodes to obtain a modulo value;
matching the modulus value with the sequence number of the available computing node to obtain a matching result;
And distributing the analyzed calculation tasks to the available calculation nodes according to the matching result to obtain a target calculation node.
According to the description, the scheduling node issues the analyzed calculation tasks to the calculation nodes according to the order of the granularity of the task dimension from small to large, so that the calculation of the next step can directly utilize the calculation result of the previous step, the processing efficiency is improved, the calculation tasks are distributed according to the matching result of the modulus value and the serial number of the available calculation nodes, the calculation tasks can be uniformly distributed on each calculation node, and the calculation tasks can be ensured to be completed more quickly.
Further, the target computing node is further configured to:
transmitting a data reading request to the data arrangement layer, wherein the data reading request comprises data to be calculated corresponding to the calculation task;
pulling the data to be calculated from the local cache to a self memory, and calculating the data to be calculated to obtain a calculation result;
storing the calculation result to the local cache, and sending calculation completion information to the data arrangement layer;
the data orchestration layer is further configured to:
reading the data to be calculated from the distributed storage node to the local cache according to the data reading request;
Acquiring the calculation result from the local cache according to the calculation completion information, and determining distributed storage nodes of data to be written;
and converting the calculation result according to the storage form of the distributed storage node of the data to be written to obtain a converted calculation result, and asynchronously writing the converted calculation result into the distributed storage node of the data to be written.
It can be seen from the above description that after the calculation task is completed, the calculation node stores the calculation result in the local cache, then the data arrangement layer obtains the calculation result from the local cache, converts the calculation result according to the storage form of the distributed storage node, asynchronously writes the converted calculation result into the distributed storage node, and can adapt to the distributed storage node with multiple storage forms.
The NetCDF weather data processing method and the terminal provided by the invention can be applied to a large-scale NetCDF weather data processing scene, and the following description is given by a specific embodiment:
Referring to fig. 1, a first embodiment of the present invention is as follows:
a NetCDF meteorological data processing method comprises the following steps:
s1, a dispatching node collects NetCDF meteorological data and monitors available states of a plurality of computing nodes.
In an alternative embodiment, the software and hardware environment required for the task needs to be built up before S1 is performed. Wherein the NetCDF weather data is weather data in a NetCDF format, and the available state comprises available or unavailable.
Specifically, the collected NetCDF weather data is first transmitted to a data arrangement layer, the data arrangement layer stores the NetCDF weather data, and then transmitted to a scheduling node, and the scheduling node monitors the available states of a plurality of computing nodes at the same time.
S2, the dispatching node determines the number of available computing nodes according to the available states, and calculates hash values of file names of the NetCDF meteorological data by using a hash function according to the number of the available computing nodes.
S3, the scheduling node groups the NetCDF weather data according to the hash value to obtain grouped NetCDF weather data, and distributes the grouped NetCDF weather data to a plurality of available computing nodes.
Specifically, the scheduling node divides the hash value by the number of the available computing nodes to perform modulo operation to obtain a remainder, and distributes the NetCDF meteorological data to the available computing nodes corresponding to the remainder; for example, if the remainder is 0, then the NetCDF weather data corresponding to the remainder is allocated to the available computing node with the number 0; in an alternative embodiment, the in-process computing node number is dynamic, if the in-process node number 3 fails, the following number 4 becomes number 3, and so on.
S4, analyzing the NetCDF meteorological data distributed by the dispatching node by a plurality of computing nodes to obtain relational space-time data, and storing the relational space-time data into a local cache corresponding to the computing nodes, wherein the method specifically comprises S41-S44:
s41, the plurality of computing nodes judge whether the netCDF weather data distributed by the dispatching nodes accords with the preset grid precision, if not, the netCDF weather data are processed by using an interpolation algorithm to obtain the netCDF weather data which accord with the preset grid precision, and if so, S42 is directly executed.
Each grid represents a plane with a certain range, and the preset grid precision is set according to practical situations, for example, 10km by 10km.
S42, meshing the NetCDF weather data conforming to the preset mesh precision according to the longitude and latitude to obtain a plurality of meshes, wherein each mesh contains time sequence data corresponding to the NetCDF weather data conforming to the preset mesh precision.
In an optional implementation manner, each grid is numbered, an association relationship between the grid number and the longitude and latitude range of the grid is established, and the association relationship is stored. Therefore, the grid can be prevented from being acquired through longitude and latitude during calculation, and the grid can be acquired by directly using the grid number, so that the method is more rapid and simple.
S43, processing the time sequence data of each grid by using an interpolation algorithm to obtain the time-space data of the relation type.
The time sequence data of each grid is processed by interpolation algorithm to become the time-space data of the same frequency relation type.
In an alternative embodiment, if the spatiotemporal data of the relationship type has an outlier or missing value, a repair function is used to calculate a repair value for the outlier or missing value or to leave the outlier or missing value empty.
S44, storing the relational space-time data to a local cache corresponding to the computing node.
In an alternative embodiment, the computing node sorts the relational spatiotemporal data according to the grid number and time sequence mode, and stores the sorted relational spatiotemporal data into a local cache corresponding to the computing node.
Because the data is required to be transmitted between the computing node and the data arrangement layer through the network, bandwidth and IO are required, and a certain transmission delay exists, the local cache can cache the data to the computing node or a position closer to the computing node, so that the computing node can realize or approximate to local computation, and the transmission delay is reduced.
In an alternative implementation mode, the computing node processes data by using a data format of a data lake bin, supports ACID transaction and data compression, controls parallel writing of the data, prevents writing failure caused by conflict of simultaneous writing, and saves storage cost and transmission cost.
S5, the local cache asynchronously updates the relational space-time data to a data arrangement layer, and the data arrangement layer stores the relational space-time data to a distributed storage node.
In an alternative embodiment, the local cache asynchronously updates the ordered spatiotemporal data of the relationship to a data orchestration layer that stores the ordered spatiotemporal data of the relationship to distributed storage nodes. The data are stored in the data arrangement layer according to the sequence of the grid number and the time sequence, so that the data can be read in sequence later when the data are read according to the grid number, and the fastest reading speed is ensured.
And S6, the scheduling node receives the calculation tasks corresponding to the relational space-time data, and distributes the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain a target calculation node.
In order to complete calculation for different purposes in multiple rounds, the traditional scheme can repeatedly store data and pull data from the storage node, and the scheduling node, the data arrangement layer and the local cache of the calculation node of the invention act together to realize more efficient data calculation. Specifically comprises S61-S65:
and S61, the scheduling node receives a calculation task corresponding to the relational space-time data.
S62, the dispatching node analyzes the calculation tasks according to task dimensions to obtain analyzed calculation tasks, and issues the analyzed calculation tasks to the calculation nodes according to the order of the granularity of the task dimensions from small to large.
For example, if the meteorological data needs to be statistically analyzed according to the dimensions of hours, days, months, years, etc., the calculation task of the dimension with smaller granularity is preferentially executed, so that the calculation output of the previous step can be directly used as the calculation input of the next stage, thereby improving the calculation efficiency.
S63, for each issued calculation task after analysis, the scheduling node performs modulo on the grid number in the calculation task after analysis according to the number of available calculation nodes to obtain a modulo value.
S64, matching the modulus value with the sequence number of the available computing node to obtain a matching result.
In an alternative implementation manner, a corresponding relation between the grid number and the computing node is established according to the matching result, so that when the computing node is abnormal, the standby node can be used for continuously completing work according to the corresponding relation.
S65, distributing the analyzed calculation tasks to the available calculation nodes according to the matching result to obtain target calculation nodes. Thereby evenly distributing computing tasks across the available computing nodes.
And S7, the target computing node reads the data to be computed corresponding to the computing task from the distributed storage node through the local cache corresponding to the target computing node, and pulls the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained.
In the traditional memory calculation separation scheme, tasks are sequentially and serially carried out from a storage node to a memory of a calculation node to calculation, but after a data arrangement layer is added, the tasks are split into two simultaneous processes: the method of pulling from the storage node to the local cache, and reading from the local cache to memory calculation, namely pulling while calculating, thereby alleviating the problem of higher delay of remote data reading caused by bandwidth bottleneck, and specifically comprising the steps of S71-S73:
S71, the target computing node sends a data reading request to the data arrangement layer, wherein the data reading request comprises data to be computed corresponding to the computing task.
In an optional implementation manner, the data reading request includes a grid number of the data to be calculated corresponding to the calculation task, so that the data is directly obtained through the grid number, and the data obtaining speed is improved.
And S72, the data arrangement layer reads the data to be calculated from the distributed storage node to the local cache according to the data reading request. As the data to be calculated is stored in the local cache, if the calculation of the next step is further carried out in the step, the data is directly read from the local cache without pulling the data from the distributed storage nodes, and the data transmission time is saved.
S73, the target computing node pulls the data to be computed from the local cache to a self memory, and computes the data to be computed to obtain a computing result.
That is, data is pulled from the distributed storage nodes to the local cache, and then the target computing node is pulled from the local cache to its own memory computation.
And S8, the target computing node stores the computing result to the local cache and sends computing completion information to the data arrangement layer.
S9, the data arrangement layer acquires the calculation result from the local cache according to the calculation completion information, and determines distributed storage nodes of data to be written.
S10, the data arrangement layer converts the calculation result according to the storage form of the distributed storage nodes of the data to be written, the converted calculation result is obtained, and the converted calculation result is asynchronously written into the distributed storage nodes of the data to be written.
In an optional implementation manner, the data arrangement layer converts a data format or structure of the calculation result according to a storage form of the distributed storage node of the data to be written, so as to obtain a converted calculation result. To accommodate the requirements of the data driven application.
The data arrangement layer can also remove redundant or error data, ensure that the pulled data only contains relevant information, and delete or correct error or inconsistent data, thereby ensuring the accuracy and consistency of the stored data.
In an optional implementation manner, the distributed storage node has the forms of MinIO (a file system service with high performance and high availability), HDFS (a distributed file system), object storage service of cloud manufacturers and the like, has the characteristics of elastic capacity expansion, high-performance read-write, consistency guarantee, disaster tolerance and the like, and mainly stores original NetCDF meteorological data, relational space-time data after the NetCDF meteorological data are processed, and calculation results obtained by analyzing and processing the NetCDF meteorological data.
Referring to fig. 2, a second embodiment of the present invention is as follows:
a NetCDF weather data processing terminal, comprising:
the system comprises a plurality of computing nodes, a plurality of scheduling nodes and a plurality of scheduling nodes, wherein the computing nodes are used for analyzing the NetCDF meteorological data distributed by the scheduling nodes to obtain relational space-time data, and storing the relational space-time data into local caches corresponding to the computing nodes; the target computing node allocated to the task is further used for reading data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, and pulling the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained;
the local cache is used for asynchronously updating the space-time data of the relation type to a data arrangement layer and reading data to be calculated corresponding to the calculation task from the distributed storage node;
the data arrangement layer is used for storing the relational space-time data to the distributed storage nodes;
the scheduling node is used for receiving the calculation tasks corresponding to the relational space-time data, and distributing the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain a target calculation node;
And the distributed storage node is used for storing the space-time data of the relation type.
Further, the scheduling node is further configured to:
collecting NetCDF meteorological data and monitoring the available states of a plurality of computing nodes;
determining the number of available computing nodes according to the available states, and calculating hash values of file names of the NetCDF meteorological data by using a hash function according to the number of the available computing nodes;
and grouping the NetCDF meteorological data according to the hash value to obtain grouped NetCDF meteorological data, and distributing the grouped NetCDF meteorological data to a plurality of available computing nodes.
Further, the plurality of computing nodes are specifically configured to:
judging whether the NetCDF meteorological data distributed by the dispatching nodes accords with the preset grid precision, if not, processing the NetCDF meteorological data by using an interpolation algorithm to obtain the NetCDF meteorological data which accords with the preset grid precision;
meshing division is carried out on the NetCDF meteorological data conforming to the preset mesh precision according to the longitude and latitude to obtain a plurality of meshes, wherein each mesh contains time sequence data corresponding to the NetCDF meteorological data conforming to the preset mesh precision;
And processing the time sequence data of each grid by using an interpolation algorithm to obtain the time-space data of the relation type.
Further, the scheduling node is specifically configured to:
analyzing the computing task according to the task dimension to obtain an analyzed computing task, and issuing the analyzed computing task to the computing node according to the order of the granularity of the task dimension from small to large;
for each issued calculation task after analysis, the scheduling node performs modulo according to the number of available calculation nodes to obtain a modulo value;
matching the modulus value with the sequence number of the available computing node to obtain a matching result;
and distributing the analyzed calculation tasks to the available calculation nodes according to the matching result to obtain a target calculation node.
Further, the target computing node is further configured to:
transmitting a data reading request to the data arrangement layer, wherein the data reading request comprises data to be calculated corresponding to the calculation task;
pulling the data to be calculated from the local cache to a self memory, and calculating the data to be calculated to obtain a calculation result;
Storing the calculation result to the local cache, and sending calculation completion information to the data arrangement layer;
the data orchestration layer is further configured to:
reading the data to be calculated from the distributed storage node to the local cache according to the data reading request;
acquiring the calculation result from the local cache according to the calculation completion information, and determining distributed storage nodes of data to be written;
and converting the calculation result according to the storage form of the distributed storage node of the data to be written to obtain a converted calculation result, and asynchronously writing the converted calculation result into the distributed storage node of the data to be written.
In summary, according to the NetCDF weather data processing method and terminal provided by the invention, multiple computing nodes analyze the NetCDF weather data distributed by the scheduling node, and store the obtained relational spatiotemporal data in a local cache corresponding to the computing nodes, the local cache asynchronously updates the relational spatiotemporal data to a data arrangement layer, the data arrangement layer stores the relational spatiotemporal data in a distributed storage node, when the scheduling node receives the computing tasks corresponding to the relational spatiotemporal data, the computing tasks are distributed to different computing nodes according to the number of available computing nodes, the target computing node distributed to the tasks reads the data to be computed from the distributed storage node through the local cache corresponding to the computing nodes, and pulls the data to be computed in the local cache to a self memory for computation, so that the processing speed of the NetCDF weather data is increased by using the multiple computing nodes in parallel, and a uniform access position is provided by using the data arrangement layer as an intermediate layer between the computing nodes and the distributed storage nodes, so that the weather data processing efficiency is increased; in addition, after the calculation task is finished, the calculation node stores the calculation result in the local cache, then the data arrangement layer acquires the calculation result from the local cache, converts the calculation result according to the storage form of the distributed storage node, asynchronously writes the converted calculation result into the distributed storage node, and can adapt to the distributed storage node with various storage forms.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims (10)

1. A NetCDF meteorological data processing method is characterized by comprising the following steps:
analyzing the NetCDF meteorological data distributed by the dispatching nodes by a plurality of computing nodes to obtain relational space-time data, and storing the relational space-time data into a local cache corresponding to the computing nodes;
the local cache asynchronously updates the relational spatiotemporal data to a data arrangement layer, and the data arrangement layer stores the relational spatiotemporal data to a distributed storage node;
the scheduling node receives calculation tasks corresponding to the relational space-time data, and distributes the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain target calculation nodes;
and the target computing node reads the data to be computed corresponding to the computing task from the distributed storage node through the local cache corresponding to the target computing node, and pulls the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained.
2. The NetCDF weather data processing method of claim 1, wherein before the plurality of computing nodes parse the NetCDF weather data allocated by the scheduling node to obtain the relational spatiotemporal data, the method comprises:
the scheduling node collects NetCDF meteorological data and monitors the available states of a plurality of computing nodes;
the dispatching node determines the number of available computing nodes according to the available states, and calculates hash values of file names of the NetCDF meteorological data by using a hash function according to the number of the available computing nodes;
and the scheduling node groups the NetCDF meteorological data according to the hash value to obtain grouped NetCDF meteorological data, and distributes the grouped NetCDF meteorological data to a plurality of available computing nodes.
3. The NetCDF weather data processing method of claim 1, wherein the plurality of computing nodes parse the NetCDF weather data allocated by the scheduling node to obtain the relational spatio-temporal data comprises:
the plurality of computing nodes judge whether the NetCDF meteorological data distributed by the dispatching nodes accords with the preset grid precision, if not, the NetCDF meteorological data are processed by using an interpolation algorithm, and the NetCDF meteorological data which accord with the preset grid precision are obtained;
Meshing division is carried out on the NetCDF meteorological data conforming to the preset mesh precision according to the longitude and latitude to obtain a plurality of meshes, wherein each mesh contains time sequence data corresponding to the NetCDF meteorological data conforming to the preset mesh precision;
and processing the time sequence data of each grid by using an interpolation algorithm to obtain the time-space data of the relation type.
4. The NetCDF weather data processing method of claim 1, wherein the assigning the computing tasks to different computing nodes according to the number of available computing nodes, obtaining a target computing node comprises:
the dispatching node analyzes the calculation tasks according to task dimensions to obtain analyzed calculation tasks, and issues the analyzed calculation tasks to the calculation nodes according to the order of the granularity of the task dimensions from small to large;
for each issued calculation task after analysis, the scheduling node performs modulo according to the number of available calculation nodes to obtain a modulo value;
matching the modulus value with the sequence number of the available computing node to obtain a matching result;
And distributing the analyzed calculation tasks to the available calculation nodes according to the matching result to obtain a target calculation node.
5. The NetCDF weather data processing method according to claim 1, wherein the target computing node reads data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, and pulls the data to be computed in the local cache to a memory of the target computing node to perform computation, and obtaining a computation result includes:
the target computing node sends a data reading request to the data arrangement layer, wherein the data reading request comprises data to be computed corresponding to the computing task;
the data arrangement layer reads the data to be calculated from the distributed storage node to a local cache corresponding to the target calculation node according to the data reading request;
the target computing node pulls the data to be computed from the local cache to a self memory, and computes the data to be computed to obtain a computing result;
the target computing node reads data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, pulls the data to be computed in the local cache to a memory of the target computing node for computing, and after a computing result is obtained, the target computing node comprises:
The target computing node stores the computing result to the local cache and sends computing completion information to the data arrangement layer;
the data arrangement layer acquires the calculation result from the local cache according to the calculation completion information, and determines distributed storage nodes to be written with data;
the data arrangement layer converts the calculation result according to the storage form of the distributed storage node of the data to be written, obtains a converted calculation result, and asynchronously writes the converted calculation result into the distributed storage node of the data to be written.
6. A NetCDF weather data processing terminal, comprising:
the system comprises a plurality of computing nodes, a plurality of scheduling nodes and a plurality of scheduling nodes, wherein the computing nodes are used for analyzing the NetCDF meteorological data distributed by the scheduling nodes to obtain relational space-time data, and storing the relational space-time data into local caches corresponding to the computing nodes; the target computing node allocated to the task is further used for reading data to be computed corresponding to the computing task from the distributed storage node through a local cache corresponding to the target computing node, and pulling the data to be computed in the local cache to a self memory for computing, so that a computing result is obtained;
The local cache is used for asynchronously updating the space-time data of the relation type to a data arrangement layer and reading data to be calculated corresponding to the calculation task from the distributed storage node;
the data arrangement layer is used for storing the relational space-time data to the distributed storage nodes;
the scheduling node is used for receiving the calculation tasks corresponding to the relational space-time data, and distributing the calculation tasks to different calculation nodes according to the number of the available calculation nodes to obtain a target calculation node;
and the distributed storage node is used for storing the space-time data of the relation type.
7. The NetCDF weather data processing terminal of claim 6, wherein the scheduling node is further configured to:
collecting NetCDF meteorological data and monitoring the available states of a plurality of computing nodes;
determining the number of available computing nodes according to the available states, and calculating hash values of file names of the NetCDF meteorological data by using a hash function according to the number of the available computing nodes;
and grouping the NetCDF meteorological data according to the hash value to obtain grouped NetCDF meteorological data, and distributing the grouped NetCDF meteorological data to a plurality of available computing nodes.
8. The NetCDF weather data processing terminal of claim 6, wherein the plurality of computing nodes are specifically configured to:
judging whether the NetCDF meteorological data distributed by the dispatching nodes accords with the preset grid precision, if not, processing the NetCDF meteorological data by using an interpolation algorithm to obtain the NetCDF meteorological data which accords with the preset grid precision;
meshing division is carried out on the NetCDF meteorological data conforming to the preset mesh precision according to the longitude and latitude to obtain a plurality of meshes, wherein each mesh contains time sequence data corresponding to the NetCDF meteorological data conforming to the preset mesh precision;
and processing the time sequence data of each grid by using an interpolation algorithm to obtain the time-space data of the relation type.
9. The NetCDF weather data processing terminal of claim 6, wherein the scheduling node is specifically configured to:
analyzing the computing task according to the task dimension to obtain an analyzed computing task, and issuing the analyzed computing task to the computing node according to the order of the granularity of the task dimension from small to large;
for each issued calculation task after analysis, the scheduling node performs modulo according to the number of available calculation nodes to obtain a modulo value;
Matching the modulus value with the sequence number of the available computing node to obtain a matching result;
and distributing the analyzed calculation tasks to the available calculation nodes according to the matching result to obtain a target calculation node.
10. The NetCDF weather data processing terminal of claim 6, wherein the target computing node is further configured to:
transmitting a data reading request to the data arrangement layer, wherein the data reading request comprises data to be calculated corresponding to the calculation task;
pulling the data to be calculated from the local cache to a self memory, and calculating the data to be calculated to obtain a calculation result;
storing the calculation result to the local cache, and sending calculation completion information to the data arrangement layer;
the data orchestration layer is further configured to:
reading the data to be calculated from the distributed storage node to the local cache according to the data reading request;
acquiring the calculation result from the local cache according to the calculation completion information, and determining distributed storage nodes of data to be written;
and converting the calculation result according to the storage form of the distributed storage node of the data to be written to obtain a converted calculation result, and asynchronously writing the converted calculation result into the distributed storage node of the data to be written.
CN202311370337.7A 2023-10-23 2023-10-23 NetCDF meteorological data processing method and terminal Active CN117131000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311370337.7A CN117131000B (en) 2023-10-23 2023-10-23 NetCDF meteorological data processing method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311370337.7A CN117131000B (en) 2023-10-23 2023-10-23 NetCDF meteorological data processing method and terminal

Publications (2)

Publication Number Publication Date
CN117131000A true CN117131000A (en) 2023-11-28
CN117131000B CN117131000B (en) 2023-12-29

Family

ID=88863018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311370337.7A Active CN117131000B (en) 2023-10-23 2023-10-23 NetCDF meteorological data processing method and terminal

Country Status (1)

Country Link
CN (1) CN117131000B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110012080A (en) * 2019-03-21 2019-07-12 新华三技术有限公司 Data processing method
US10445334B1 (en) * 2015-09-11 2019-10-15 Amazon Technologies, Inc. Query transmission in type-limited interchange formats
CN111723113A (en) * 2020-06-19 2020-09-29 深圳前海微众银行股份有限公司 Distributed caching method and device for business data, terminal equipment and storage medium
CN112991475A (en) * 2021-05-17 2021-06-18 航天宏图信息技术股份有限公司 Method and device for acquiring remote sensing image
CN116594977A (en) * 2023-05-10 2023-08-15 阿里巴巴达摩院(杭州)科技有限公司 Distributed processing system for remote sensing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445334B1 (en) * 2015-09-11 2019-10-15 Amazon Technologies, Inc. Query transmission in type-limited interchange formats
CN110012080A (en) * 2019-03-21 2019-07-12 新华三技术有限公司 Data processing method
CN111723113A (en) * 2020-06-19 2020-09-29 深圳前海微众银行股份有限公司 Distributed caching method and device for business data, terminal equipment and storage medium
CN112991475A (en) * 2021-05-17 2021-06-18 航天宏图信息技术股份有限公司 Method and device for acquiring remote sensing image
CN116594977A (en) * 2023-05-10 2023-08-15 阿里巴巴达摩院(杭州)科技有限公司 Distributed processing system for remote sensing data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭凯中等: "面向过程的海洋时空数据分布式存储与并行检索", 中国海洋大学学报(自然科学版), vol. 51, no. 11, pages 94 - 101 *

Also Published As

Publication number Publication date
CN117131000B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
US10176208B2 (en) Processing time series data from multiple sensors
CN110022226B (en) Object-oriented data acquisition system and acquisition method
CN108920153B (en) Docker container dynamic scheduling method based on load prediction
CN106649828B (en) Data query method and system
CN105303456A (en) Method for processing monitoring data of electric power transmission equipment
CN111399764B (en) Data storage method, data reading device, data storage equipment and data storage medium
CN107220271B (en) Method and system for storage processing and management of distributed digital resources
CN110245134B (en) Increment synchronization method applied to search service
CN110597835B (en) Transaction data deleting method and device based on blockchain
CN114647716B (en) System suitable for generalized data warehouse
US10642530B2 (en) Global occupancy aggregator for global garbage collection scheduling
CN111966289A (en) Partition optimization method and system based on Kafka cluster
CN115033646B (en) Method for constructing real-time warehouse system based on Flink and Doris
CN112000703B (en) Data warehousing processing method and device, computer equipment and storage medium
CN117131000B (en) NetCDF meteorological data processing method and terminal
CN114895985B (en) Data loading system for graph neural network training based on sampling
CN116431635A (en) Lake and warehouse integrated-based power distribution Internet of things data real-time processing system and method
CN114124643B (en) PaaS-based network equipment flow collection method and device
CN116186082A (en) Data summarizing method based on distribution, first server and electronic equipment
CN115587147A (en) Data processing method and system
CN111475471B (en) Information system for industrial design resource sharing
CN114116293A (en) MPI-IO-based MapReduce overflow writing improving method
CN111427851A (en) Method and equipment for optimizing multi-level storage efficiency of HDFS (Hadoop distributed File System) across external storage system
CN113360576A (en) Power grid mass data real-time processing method and device based on Flink Streaming
CN111143280B (en) Data scheduling method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant