CN116126552A - Mass meteorological observation data processing method and device based on Storm - Google Patents
Mass meteorological observation data processing method and device based on Storm Download PDFInfo
- Publication number
- CN116126552A CN116126552A CN202211678249.9A CN202211678249A CN116126552A CN 116126552 A CN116126552 A CN 116126552A CN 202211678249 A CN202211678249 A CN 202211678249A CN 116126552 A CN116126552 A CN 116126552A
- Authority
- CN
- China
- Prior art keywords
- data
- bufr
- file
- decoding
- warehousing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 106
- 238000012360 testing method Methods 0.000 claims description 12
- 238000012937 correction Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 238000001556 precipitation Methods 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 abstract description 36
- 238000013461 design Methods 0.000 abstract description 8
- 210000001503 joint Anatomy 0.000 abstract description 5
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000000034 method Methods 0.000 description 29
- 230000008569 process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000012544 monitoring process Methods 0.000 description 8
- 230000032683 aging Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 4
- 229940126655 NDI-034858 Drugs 0.000 description 3
- 241000290929 Nimbus Species 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000005111 flow chemistry technique Methods 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/547—Messaging middleware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Radar Systems Or Details Thereof (AREA)
Abstract
The invention discloses a Storm-based mass meteorological observation data processing method and device, and belongs to the field of meteorological information processing. The invention designs a new STORM processing framework, can directly butt joint BUFR format information of RabbitMQ, reduces intermediate links from transmission to warehouse entry, realizes data processing once, each message and BUFR information is a complete, realizes whole flow from transmission to warehouse entry without floor processing, adopts a plurality of bolts to run in parallel, and greatly improves processing timeliness.
Description
Technical Field
The invention relates to the field of meteorological information processing, in particular to a Storm-based mass meteorological observation data processing method and device.
Background
With the continuous development of the weather detection technology, the number and frequency of weather observation sites are increasing. The ground meteorological observation stations are mainly divided into national stations and regional stations, and the total number of the stations reaches 7 ten thousand. Therein 2400 national stations achieve one minute observation, all regional stations achieve 5 minutes observation, and more regional stations will achieve 1 minute observation in the future. The weather forecast puts higher demands on the timeliness of data service while the number and frequency of sites are increased, and the timeliness of the desktop of the forecaster is improved from 3 minutes to 1 minute after the observation. The ever-increasing data volume of the meteorological observation and higher aging requirements put forward higher requirements on the transmission and the treatment of the meteorological observation.
In order to improve the data transmission timeliness, the national station and the regional station sequentially realize the conversion from file transmission to message transmission, and basically realize the data transmission service of going from the beginning to the end. The data processing is used as an intermediate link for effectively linking transmission and service, and the rapid processing under the message transmission mode is realized according to the change of the service, so that the continuously developed service requirement is met.
The transmission of the original 7-ten thousand ground weather automatic observation stations adopts ftp transmission, the data format is a common text format, the original STORM processing frame can only be corresponding to the original STORM processing frame by using a file and notification mode, and rmq information in BUFR format cannot be directly analyzed.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a Storm-based mass meteorological observation data processing method and device, which can directly butt-joint BUFR format messages of RabbitMQ, reduce intermediate links from transmission to warehousing, realize data on-the-fly processing, and greatly improve processing timeliness.
The technical scheme provided by the invention is as follows:
a Storm-based mass meteorological observation data processing method, the method comprising:
s100: the Spout of the Storm framework obtains message data from the RabbitMQ message queue and converts the message data into a multiple;
The message data of the BUFR data message queue, the BUFR file message queue and the text file message queue are BUFR data message, BUFR file notification message and text file notification message respectively, and the BUFR file notification message and the text file notification message comprise respective four-level codes and full-path file names;
s200: transmitting the multiple converted from the BUFR data message to a first decoding Bolt and a first warehousing Bolt, decoding and warehousing;
s300: transmitting the complete converted from the BUFR file notification message to a second decoding Bolt, wherein the second decoding Bolt reads the content of the BUFR file from the shared file system according to the full path file name contained in the BUFR file notification message, decodes the content, and stores the content through a second storage Bolt after decoding is completed;
s400: and transmitting the multiple converted from the text file notification message to a third decoding Bolt, wherein the third decoding Bolt reads text file contents from a shared file system according to the full-path file name contained in the text file notification message, decodes the text file contents, and stores the text file contents through a third storage Bolt after decoding.
Further, the first decoding Bolt and the first warehousing Bolt are integrated first decoding warehousing bolts, the BUFR data message queue is divided into an hour BUFR data message queue and a minute BUFR data message queue, and the hour BUFR data message and the minute BUFR data message of the hour BUFR data message queue and the minute BUFR data message of the minute BUFR data message queue are converted into respective multiple through respective corresponding Spout and then sent to respective corresponding first decoding warehousing bolts for respective decoding and warehousing.
Further, the operation of the first decoding warehouse-in Bolt sequentially includes: format checking, element decoding, missing test checking, time checking, characteristic value conversion, warehousing sql statement generation, correction report processing and data warehousing.
Further, the BUFR data message sequentially comprises an indication section, an identification section, an optional section, a data description section, a data section and an end section;
the format check includes: judging whether the indication section starts with a BUFR character string or not; judging whether a field representing the whole length of the BUFR data message in the indication section accords with the actual whole length of the BUFR data message; judging whether the field representing the length of each segment accords with the actual length of each segment; performing byte checking calculation according to the predefined data descriptors in the data description section, and judging whether the calculated length is consistent with the actual length; if the judgment passes, the format check passes, the element decoding operation is executed, otherwise, the format check does not pass, and the element decoding operation is not executed;
And/or;
the element decoding includes: invoking a corresponding decoding algorithm to perform element decoding to obtain metadata information comprising observation time, station number, stations, longitude and latitude and altitude and element values comprising temperature, air pressure, wind direction, wind speed, humidity and precipitation;
and/or;
the missing test examination includes: setting the element value of the missing measurement as a preset missing measurement value;
and/or;
the time check includes: discarding the data of which the observation time after element decoding is 24 hours ahead of the current time and the data of which the observation time after element decoding is 672 hours ahead of the current time;
and/or;
the eigenvalue transformation includes: setting a specific element value as a preset characteristic value;
and/or;
the generating of the put-in sql statement comprises: reading the set table name from the configuration file, and according to a pre-designed warehousing rule, realizing the correspondence between a warehousing field and an element value in a code to generate a warehousing sql statement; the system comprises a configuration file, a system function and attribute information, wherein the warehousing sql statement also comprises a management field read from the configuration file and a field from the system function and the attribute information;
and/or;
the correction report processing includes: when the analyzed BBB item element is not '000', warehousing is only carried out when the corresponding original BBB item element in the database is smaller than the analyzed BBB item element;
And/or;
the data warehouse entry includes: and calling a corresponding warehousing interface to realize corresponding data warehousing.
Further, the second decoding Bolt and the second warehousing Bolt are integrated second decoding warehousing bolts, the BUFR file message queue is divided into an hour BUFR file message queue and a minute BUFR file message queue, and the hour BUFR file notification message and the minute BUFR file notification message of the hour BUFR file message queue and the minute BUFR file message queue are converted into respective corresponding multiple through respective corresponding spots and then sent to respective corresponding second decoding warehousing bolts, and the respective BUFR file contents are decoded and warehoused respectively.
Further, the third warehouse-in Bolt comprises an hour warehouse-in Bolt and a minute warehouse-in Bolt;
the S400 includes:
s410: transmitting the multiple converted by the text file notification message to a third decoding Bolt, wherein the third decoding Bolt reads text file contents from a shared file system according to the full path file name contained in the text file notification message and decodes the text file contents;
s420: and sending the data with the observation time after decoding being 0 minutes to an hour warehouse-in Bolt for warehouse-in, and sending the data with the observation time after decoding being not 0 minutes to a minute warehouse-in Bolt for warehouse-in.
Further, for BUFR data information, warehousing is carried out in a way of submitting the BUFR data information one by one;
recording the decoded data into a cache list for the BUFR file and the text file, and submitting the data in the cache list in batches for storage after the record number of the data in the cache list reaches the set number or after the BUFR file and the text file are decoded;
and when the batch submission and warehousing fails, submitting the data in the cache list one by one for warehousing.
Further, the method further comprises:
s500: and generating DI information according to the warehousing state and sending the DI information to a corresponding DIEI processing Bolt, caching the DI information in the DIEI processing Bolt, and sending the DI information to the astronomical mirror in batches by the DEEI processing Bolt after the cached DI information reaches a set quantity or a set time period.
Further, the method further comprises:
and directly acquiring the station information from the public metadata system through station information update processing felt, comparing the station information with the station information of the local file, and applying the changed station information to the Storm framework.
A Storm-based mass meteorological observation data processing apparatus, the apparatus comprising:
The data acquisition module is used for acquiring message data from a RabbitMQ message queue by Spout of the Storm framework and converting the message data into a multiple;
the message data of the BUFR data message queue, the BUFR file message queue and the text file message queue are BUFR data message, BUFR file notification message and text file notification message respectively, and the BUFR file notification message and the text file notification message comprise respective four-level codes and full-path file names;
the first decoding and warehousing module is used for transmitting the multiple converted by the BUFR data message to a first decoding Bolt and a first warehousing Bolt for decoding and warehousing;
the second decoding and warehousing module is used for transmitting the multiple converted by the BUFR file notification message to a second decoding Bolt, and the second decoding Bolt reads the BUFR file content from the shared file system according to the full path file name contained in the BUFR file notification message, decodes the BUFR file content and performs warehousing through the second warehousing Bolt after the decoding is completed;
and the third decoding and warehousing module is used for transmitting the multiple converted from the text file notification message to a third decoding Bolt, and the third decoding Bolt reads the text file content from the shared file system according to the full path file name contained in the text file notification message, decodes the text file content, and warehousing the text file content through the third warehousing Bolt after the decoding is finished.
Further, the first decoding Bolt and the first warehousing Bolt are integrated first decoding warehousing bolts, the BUFR data message queue is divided into an hour BUFR data message queue and a minute BUFR data message queue, and the hour BUFR data message and the minute BUFR data message of the hour BUFR data message queue and the minute BUFR data message of the minute BUFR data message queue are converted into respective multiple through respective corresponding Spout and then sent to respective corresponding first decoding warehousing bolts for respective decoding and warehousing.
Further, the operation of the first decoding warehouse-in Bolt sequentially includes: format checking, element decoding, missing test checking, time checking, characteristic value conversion, warehousing sql statement generation, correction report processing and data warehousing.
Further, the BUFR data message sequentially comprises an indication section, an identification section, an optional section, a data description section, a data section and an end section;
the format check includes: judging whether the indication section starts with a BUFR character string or not; judging whether a field representing the whole length of the BUFR data message in the indication section accords with the actual whole length of the BUFR data message; judging whether the field representing the length of each segment accords with the actual length of each segment; performing byte checking calculation according to the predefined data descriptors in the data description section, and judging whether the calculated length is consistent with the actual length; if the judgment passes, the format check passes, the element decoding operation is executed, otherwise, the format check does not pass, and the element decoding operation is not executed;
And/or;
the element decoding includes: invoking a corresponding decoding algorithm to perform element decoding to obtain metadata information comprising observation time, station number, stations, longitude and latitude and altitude and element values comprising temperature, air pressure, wind direction, wind speed, humidity and precipitation;
and/or;
the missing test examination includes: setting the element value of the missing measurement as a preset missing measurement value;
and/or;
the time check includes: discarding the data of which the observation time after element decoding is 24 hours ahead of the current time and the data of which the observation time after element decoding is 672 hours ahead of the current time;
and/or;
the eigenvalue transformation includes: setting a specific element value as a preset characteristic value;
and/or;
the generating of the put-in sql statement comprises: reading the set table name from the configuration file, and according to a pre-designed warehousing rule, realizing the correspondence between a warehousing field and an element value in a code to generate a warehousing sql statement; the system comprises a configuration file, a system function and attribute information, wherein the warehousing sql statement also comprises a management field read from the configuration file and a field from the system function and the attribute information;
and/or;
the correction report processing includes: when the analyzed BBB item element is not '000', warehousing is only carried out when the corresponding original BBB item element in the database is smaller than the analyzed BBB item element;
And/or;
the data warehouse entry includes: and calling a corresponding warehousing interface to realize corresponding data warehousing.
Further, the second decoding Bolt and the second warehousing Bolt are integrated second decoding warehousing bolts, the BUFR file message queue is divided into an hour BUFR file message queue and a minute BUFR file message queue, and the hour BUFR file notification message and the minute BUFR file notification message of the hour BUFR file message queue and the minute BUFR file message queue are converted into respective corresponding multiple through respective corresponding spots and then sent to respective corresponding second decoding warehousing bolts, and the respective BUFR file contents are decoded and warehoused respectively.
Further, the third warehouse-in Bolt comprises an hour warehouse-in Bolt and a minute warehouse-in Bolt;
the third decoding and binning module comprises:
the decoding unit is used for transmitting the multiple converted from the text file notification message to a third decoding Bolt, and the third decoding Bolt reads text file contents from the shared file system according to the full path file name contained in the text file notification message and decodes the text file contents;
and the warehousing unit is used for sending the data with the minute of the observation time of 0 after decoding to the hour warehousing Bolt for warehousing, and sending the data with the minute of the observation time of not 0 after decoding to the minute warehousing Bolt for warehousing.
Further, for BUFR data information, warehousing is carried out in a way of submitting the BUFR data information one by one;
recording the decoded data into a cache list for the BUFR file and the text file, and submitting the data in the cache list in batches for storage after the record number of the data in the cache list reaches the set number or after the BUFR file and the text file are decoded;
and when the batch submission and warehousing fails, submitting the data in the cache list one by one for warehousing.
Further, the device further comprises:
the monitoring module is used for generating DI information according to the warehousing state and sending the DI information to the corresponding DIEI processing Bolt, caching the DI information in the DIEI processing Bolt, and sending the DI information to the astronomical mirror in batches after the cached DI information reaches the set quantity or the set time period.
Further, the device further comprises:
and the station information updating processing module is used for directly acquiring the station information from the public metadata system through station information updating processing Bolt timing, comparing the station information with the station information of the local file and applying the changed station information to the Storm framework.
The invention has the following beneficial effects:
The invention designs a new STORM processing framework, can directly butt joint BUFR format information of RabbitMQ, reduces intermediate links from transmission to warehouse entry, realizes data processing once, each message and BUFR information is a complete, realizes whole flow from transmission to warehouse entry without floor processing, adopts a plurality of bolts to run in parallel, and greatly improves processing timeliness.
The Storm processing framework is adopted, so that the data processing has higher reliability, and when accidents are met again, worker, spout, bolt, acker and the like for processing the data, the processing framework is restarted to recover; even when one node is completely broken, the processing task running on the node can be automatically migrated to other nodes. When the existing multi-process processing meets similar conditions, the recovery can only be carried out by manual intervention, and the time and the labor are wasted.
Because Storm is specially designed for high concurrency data processing, the number of Worker, spout, bolt, acker started when the Storm is adopted to process data is designed to be configurable, and the Storm can be conveniently and rapidly adjusted according to the data quantity and the condition of server resources so as to achieve the maximum processing efficiency.
Drawings
FIG. 1 is a flow chart of a Storm-based mass meteorological observation data processing method of the invention;
FIG. 2 is a schematic diagram of a Storm cluster architecture;
FIG. 3 is a schematic deployment diagram of a Storm cluster architecture;
FIG. 4 is a Storm flow processing block topology of BUFR data messages;
FIG. 5 is a topology of a Storm streaming processing block of BUFR file data;
FIG. 6 is a Storm streaming box topology of text file (Z file) data;
FIG. 7 is a schematic diagram of a Storm framework of Spout receiving message data from a RabbitMQ message queue;
FIG. 8 is a schematic diagram of the aging comparison of the storm process of the present invention with the prior multithreading process;
FIG. 9 is a schematic diagram of a Storm-based mass meteorological observation data processing apparatus of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more clear, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
The embodiment of the invention provides a Storm-based mass meteorological observation data processing method, wherein a Storm cluster consists of Nimbus, supervisor, zookeeper and other roles, as shown in fig. 2.
Nimbus: and the master node manages, coordinates and monitors the topology running on the cluster, records the states of all the supervisors and distributes tasks.
Supervisor: after waiting for nimbus to assign tasks, works (JVM process) are generated and monitored.
Zookeeper: and coordinating state information in the cluster and providing a third party service.
STORM technical framework consists of Topology, tuple, stream, spout, bolt, worker et al concepts, wherein:
topology: the distributed computing structure (topology) of Storm, which consists of Stream, spout, bolt, will run until kill.
A repetition: the Storm core data structure contains a list of one or more key-value pairs.
Stream: a sequence consisting of infinite rounds.
Spout: the source of the stream in the Storm, connected to an external data source, converts the data into individual replies and transmits the replies as data (init).
Bolt: the operation or function in the calculation program takes one or more data streams as input, and one or more data streams are selectively output after the operation is performed on the data.
Worker: a JVM process running on a Node internally runs one or more instances of Spout or Bolt.
Based on the STORM technical framework, the method of the invention is shown in FIG. 1 and comprises the following steps:
s100: the Spout of the Storm framework takes message data from the rabitmq message queue and converts the message data into a Tuple.
In order to improve the observation timeliness of ground automatic station data, china weather bureau develops corresponding national business adjustment in 2021-2022, original ftp file transmission is replaced by RabbitMQ message queue transmission, the data format is adjusted from an original common text file to a binary BUFR format, second-level (from uploading to a switching system from a website) delay transmission after ground observation of national stations and regional stations is realized, and the delay is improved compared with the original Z file-level delay. Based on the foundation, the method processes mass meteorological observation data transmitted by the RabbitMQ message queue through a Storm flow processing frame.
At present, meteorological observation data of national stations and regional stations for Chinese meteorological observation are divided into a text format and a BUFR format. The text format is mainly the original Z file of national station, unmanned station, regional station, rainfall station, etc., and is the original standard service transmission format, which is called text file in the invention. The BUFR format data includes both forms of pure data messages, which are the primary form uploaded by the station-province-national level, and data files, which are referred to herein as BUFR data messages. The BUFR data file is a main form of accessing the provincial level Tianqing to the outer provincial automatic station data, and the data content format is consistent with the message transmission, and the invention refers to the BUFR file.
For BUFR data messages, BUFR files and text files, they are transmitted through respective corresponding RabbitMQ message queues.
The RabbitMQ message queue comprises a BUFR data message queue, a BUFR file message queue and a text file message queue, wherein message data of the BUFR data message queue, the BUFR file message queue and the text file message queue are BUFR data message, BUFR file notification message and text file notification message respectively.
The BUFR data message comprises weather data information, namely the weather data information of the BUFR data message is directly transmitted through a RabbitMQ message queue; meteorological data information of the BUFR file and the text file is not directly transmitted through the RabbitMQ message queue, but is stored in a shared file system through the BUFR file and the text file, and the BUFR file notification message and the text file notification message comprise respective four-level codes and full-path file names.
The BUFR file notification message is formatted as a four-level encoding of full path filenames, such as:
A.0001.0040.R001:/space/dpc/work/data/A/A.0001.0040.R001/202108/2021083112/Z_SURF_I_X2021_20210831125500_O_AWS_FTM_PQC.txt
A.0001.0040.R001 is CTS four-level code, which is a data type code designed inside the meteorological department, and the same data type generally has similar file name rules, and a colon is followed by a full path file name. BUFR file content and text file content including weather data information can be obtained from the shared file system according to the full path file name.
The Spout of the Storm framework is responsible for connecting with external data sources, acquiring message data of RabbitMQ, and converting the message data into a Storm core data structure complex. For different forms of message data, transmitted through different message queues (i.e., a BUFR data message queue, a BUFR file message queue, and a text file message queue), the Storm framework also includes a plurality of spots, responsible for interfacing with the different message queues.
S200: and transmitting the multiple converted from the BUFR data message to a first decoding Bolt and a first warehousing Bolt, and decoding and warehousing.
Because the BUFR data message queue stores the data message containing the meteorological observation data, the corresponding observation elements can be directly obtained by decoding, and the warehousing operation can be carried out.
S300: and transmitting the complete converted from the BUFR file notification message to a second decoding Bolt, reading the content of the BUFR file from the shared file system according to the full path file name contained in the BUFR file notification message by the second decoding Bolt, decoding, and warehousing through a second warehousing Bolt after decoding is completed.
Because the message data transmitted by the BUFR file message queue is a four-level code and full-path file name and does not comprise meteorological data information, after receiving the multiple, the second decoding Bolt is required to read the BUFR file content from the shared file system according to the full-path file name, call a corresponding decoding module according to the four-level code to decode the data, splice the data into a warehouse after the decoding is completed, and call a warehouse interface to perform corresponding warehouse.
S400: and transmitting the complete converted text file notification message to a third decoding Bolt, reading text file content from the shared file system according to the full path file name contained in the text file notification message by the third decoding Bolt, decoding, and warehousing through a third warehousing Bolt after decoding is completed.
Similar to the BUFR file, the text file also needs to be read from the shared file system according to its full path file name, decoded and put in storage.
STORM System deployment As shown in FIG. 3, the cluster consists of more than 3 servers, with the top facing the STORM cluster, on which the processing topology of the ground-engaging automatic station data is run. The RabbitMQ is used for receiving data or notification messages, a shared file system (VCS or NAS) provides consistent data storage for each node in the cluster, so that each server in the cluster can see data files under the same path, and the shared file system can be omitted when the RabbitMQ transmits pure data messages; the back-end database is used for storing the result of Storm processing and can be used for subsequent query analysis, and the database can be a relational database such as virtual valley, ORACLE, mysql and the like.
The invention designs a new STORM processing framework, can directly butt joint BUFR format information of RabbitMQ, reduces intermediate links from transmission to warehouse entry, realizes data processing once, each message and BUFR information is a complete, realizes whole flow from transmission to warehouse entry without floor processing, adopts a plurality of bolts to run in parallel, and greatly improves processing timeliness.
The Storm processing framework is adopted, so that the data processing has higher reliability, and when accidents are met again, worker, spout, bolt, acker and the like for processing the data, the processing framework is restarted to recover; even when one node is completely broken, the processing task running on the node can be automatically migrated to other nodes. When the existing multi-process processing meets similar conditions, the recovery can only be carried out by manual intervention, and the time and the labor are wasted.
Because Storm is specially designed for high concurrency data processing, the number of Worker, spout, bolt, acker started when the Storm is adopted to process data is designed to be configurable, and the Storm can be conveniently and rapidly adjusted according to the data quantity and the condition of server resources so as to achieve the maximum processing efficiency.
In order to reduce the communication overhead among different bolts of the store, the invention designs the first decoding Bolt and the first warehousing Bolt to be integrated together to form a whole first decoding warehousing Bolt, as shown in fig. 4.
The weather observation data comprises hour data and minute data, the hour data and the minute data of BUFR data messages are separated, transmitted through respective message queues, received and converted through respective corresponding Spout respectively, and decoded and put in storage through respective decoding put in storage Bolt respectively. Namely, the BUFR data message queues are divided into an hour BUFR data message queue and a minute BUFR data message queue, and the hour BUFR data message and the minute BUFR data message of the hour BUFR data message queue and the minute BUFR data message queue are converted into respective multiple through respective corresponding Spout and then sent to respective corresponding first decoding and warehousing bolts for respective decoding and warehousing.
The examination requirements of the minute data and the hour data of the national station and the regional station are different, the processing principles are different, and different message queues are used for transmission in the transmission stage, so that the minute data and the hour data are processed by using different Storm topologies, are independent from each other, and are convenient to manage. And because the requirement of the hour data warehouse-in aging is higher, the hour data and the minute data of the BUFR data are respectively decoded and warehoused, and the mutual interference is avoided, so that the treatment aging of the hour data is ensured.
The operation of the first decoding warehouse-in Bolt sequentially comprises the following steps: format checking, element decoding, missing test checking, time checking, characteristic value conversion, warehousing sql statement generation, correction report processing and data warehousing.
Wherein: the BUFR data message follows the coding format specification of WMO, and the data is divided into 6 sections, which sequentially comprise an indication section (0 section), an identification section (1 section), an optional section (2 sections), a data description section (3 sections), a data section (4 sections) and an end section (5 sections). A complete BUFR data message starts with the "BUFR" string, i.e. the first 4 bytes of the 0 segment are the string "BUFR" and ends with "7777", i.e. the 5 segments.
The format check includes:
1. it is determined whether the indicated segment (i.e., segment 0) starts with a BUFR string, if not, it is not a valid message, and if so, it proceeds downward.
2. It is determined whether a field representing the entire length of the BUFR data message in the indication section coincides with the actual entire length of the BUFR data message.
The 0 th segment has a field representing the whole length of the BUFR data message after the character string "BUFR", by which judgment is made, if not, an error is returned, and if normal, the process is continued.
3. And judging whether the field representing the length of the section in each section is consistent with the actual length of each section.
The first 3 bytes in each segment mark the length of the segment, and the length of each segment is checked according to this field, if the length does not match the actual one, the error format is used, and if the length does match the actual one, the process continues downwards.
4. And performing byte checking calculation according to the predefined data descriptors in the data description section, and judging whether the calculated length is consistent with the actual length. If the data does not match, the data is regarded as error data, and decoding is not performed any more, and otherwise, the subsequent element decoding operation is performed.
That is, if all the 1-4 judgments pass, the format check passes, the element decoding operation is executed, otherwise, the format check does not pass, and the element decoding operation is not executed;
the element decoding includes: and after the first decoding warehouse entry bolt receives the message, a corresponding decoding algorithm is called to perform element decoding, and metadata information comprising observation time, station numbers, stations, longitude and latitude and altitude and element values comprising temperature, air pressure, wind direction, wind speed, humidity and precipitation are obtained.
The missing test comprises: the missing element value is set as a preset missing value.
The missing values of the individual elements are represented differently according to the BUFR code specification. The missing test is indicated when the bits specified by the element descriptor are all 1, and the missing test of the element is uniformly converted in the database. The values for the missing time are uniformly turned into 999999, and the values for some elements which are not reported are uniformly turned into 999998.
The time check includes: the data whose element-decoded observation time leads the current time by 24 hours and the data whose element-decoded observation time precedes the current time by 672 hours are discarded.
Due to problems with the observed data itself, there is often a long time history or lead data exceeding the current time uploaded to the country level, which are all abnormal data. A large amount of historical data can cause a large amount of warehouse-in conflicts, reduce warehouse-in speed and influence real-time data warehouse-in; leading data is more confusing for the user. In order to solve the abnormal data, the processing framework is added with a function of performing time check on the acquired observation time after decoding the data elements. The setting parameters are as follows:
and (5) checking the data storage time range and unit hour.
D_date_best_day=24// data exceeding the current time for 24 hours is not allowed to be binned.
Historical data before d_date_after_day=672// 672 hours is not allowed to be put in stock.
The eigenvalue transformation includes: the specific element value is set to a preset characteristic value.
For part of special element values, conversion is needed according to corresponding observation specifications, and the corresponding characteristic values are used for representing warehousing. For example: for when wind speed < = 0.2, all corresponding wind direction element values need to be converted into static wind characteristic values 999017; when the precipitation magnitude is-1, the corresponding element value needs to be converted into trace precipitation 999990.
The generating of the warehouse-in sql statement comprises the following steps: after all the elements are decoded and subjected to time inspection, missing inspection and characteristic value conversion, the final element value is the element value to be put in storage, and corresponding put statement splicing can be performed at the moment.
Firstly, reading a set table name from a configuration file, and according to a pre-designed warehousing rule, realizing the correspondence between a warehousing field and an element value in a code, and finally generating an sql warehousing statement such as an insert into table name (field name 1, field name 2, … …) values (value 1, value 2, … …).
In addition to the above converted element values, some management fields in the warehouse-in sql statement are read from a configuration file, for example, administrative division fields indicating the administrative codes of the county-level region where a certain station is located, which are acquired from a pre-configuration according to the station number.
In addition, some fields come from system functions and attribute information, such as the time-to-stock field comes from system time, and the name of the file to be stock comes from the stock data itself.
The correction report processing comprises the following steps: when the parsed BBB item element is not '000', warehousing is only performed when the corresponding original BBB item element in the database is smaller than the parsed BBB item element.
When the message to be put in is more correct, that is, the element of the parsed BBB item is not '000', but is 'CCX', X may be any character of A-Z, and the priority is gradually increased from A to Z. This indicates that the record is an update operation to the previous data. The program searches the BBB item element value corresponding to the original record from the database according to the observation time and station number of the current data, then compares the BBB item element value with the BBB item element value of the current record to be put in storage, and when the BBB item element value recorded in the database is more than or equal to the BBB item element value of the record to be put in storage, the current record is not put in storage; and otherwise, updating and warehousing. When updating and warehousing, corresponding warehousing records are deleted from a database according to the station numbers and the observation time, and then the warehousing statement generated by the current records is executed.
The data warehouse entry comprises: after the sql statement is spliced, the first decoding and warehousing Bolt calls a corresponding JDBC warehousing interface to realize corresponding data warehousing.
Similar to the first decoding and warehousing bolts, the second decoding bolts and the second warehousing bolts can be integrally designed together, and the second decoding and warehousing bolts as a whole reduce transmission overhead among bolts, as shown in fig. 5. As with the BUFR data message, the hour data and the minute data of the BUFR file format data are separate.
Namely, the BUFR file message queue is divided into an hour BUFR file message queue and a minute BUFR file message queue, and the hour BUFR file notification message and the minute BUFR file notification message of the hour BUFR file message queue and the minute BUFR file message queue are converted into respective corresponding multiple through respective corresponding Spout and then sent to respective corresponding second decoding and warehousing bolts, and the respective BUFR file contents are decoded and warehoused respectively.
Unlike the BUFR data file, the hour data and the minute data of the Z file format (i.e. text file) data belong to the same class of data, one data file may contain both the hour data and the minute data, which may need to be separated after decoding, and the hour data entry aging requirement is higher, so that in order to ensure the hour data processing aging, different processing Bolt designs are adopted for the hour entry and the minute entry, and the third entry Bolt includes the hour entry Bolt and the minute entry Bolt, as shown in fig. 6.
Based on this, S400 of the present invention includes:
s410: and transmitting the multiple converted from the text file notification message to a third decoding Bolt, and reading the text file content from the shared file system according to the full path file name contained in the text file notification message by the third decoding Bolt and decoding.
S420: and sending the data with the observation time after decoding being 0 minutes to an hour warehouse-in Bolt for warehouse-in, and sending the data with the observation time after decoding being not 0 minutes to a minute warehouse-in Bolt for warehouse-in.
The step is used for judging according to the actual situation after the data are decoded, sending the hour data which are considered to be 0 in the observation time minute to the hour warehousing Bolt, and sending the minute data which are considered to be not 0 in the data observation time minute to the minute warehousing Bolt. When a large amount of observation data is achieved in the whole point set, the processing efficiency of the hour data can be guaranteed.
The invention adopts the mode that the application program monitors the message queue to receive the data, thereby improving the efficiency of data processing. As shown in fig. 7, the consumer adopts the manual confirmation mode, the Autoack parameter is false, and the RabbitMQ waits for the consumer to explicitly reply to the confirmation signal and then removes the message from the memory (actually marks the delete mark before deleting), because there is enough time to process the message by using the manual confirmation, there is no need to worry about the problem of message loss after the consumer hangs up.
In actual service, because the database can not connect with the message, the message can be re-entered into the queue and delivered to the next consumer to try to reprocess when the parameter request is set to true.
In order to avoid disconnection of the message middleware from the application program due to network problems, an automatic recovery mechanism when the network is abnormal is designed. When an anomaly occurs in the consumer, the connection is automatically reconnected, and the connection is requested every 10 seconds until the reconnection is successful.
In the invention, after the Supervisior on the Storm working node monitors the content of the Spout message sent by the main node, the weather decoding message (comprising the data name, the four-level code, the weather observation data and the like) is acquired, and then the task is transmitted to different processing programs (Bolt programs) for decoding processing according to the topology design. For the decoding of the BUFR file and the text file, file level checking is performed as to whether the file name is empty, whether the file content is empty, whether the length is reasonable, and the like. And (3) performing BUFR byte verification on the BUFR message, calling a decoding module to decode the data passing through the file level check and the byte verification, performing report level check, group check, element check and the like at the same time, analyzing to obtain the values of all elements, and then re-combining the element values to perform warehousing operation. The number and distribution of Spout and felt on the server can be configured and adjusted, and resources can be allocated to execute work according to requirements.
When data is put into storage, as for BUFR data information, only one record is generally contained in one information, the storage is carried out in a way of submitting each item, and the overall storage speed is improved by improving the concurrence of bolts.
For BUFR files and text files, since one file comprises a plurality of records, the warehousing efficiency is improved by adopting a batch + file submission mode. After the file is decoded, the decoded data is recorded into a buffer LIST, and then the LIST is processed and put in storage. And when the record number of the data in the cache list reaches the set number or the BUFR file and the text file are decoded, submitting the data in the cache list to warehouse in batches. And when the batch submission and warehousing fails, submitting the data in the cache list one by one for warehousing.
The concrete explanation is as follows:
a file contains a plurality of records with variable quantity, and a set quantity threshold value in the configuration file is read during warehouse entry and defaults to 200; if the number of records of the decoded LIST is greater than 200, the LIST is put into storage in batches. And (5) taking 200 records in the LIST as a batch, and submitting the records in batches for warehouse entry. If the warehouse entry is successful, the 200 records are all successful. However, if some of 200 pieces of data are stored with problems, the whole batch of data cannot be stored, and the normal records are lost. Therefore, when the batch submission fails, the batch of data in the cache list is submitted and put in storage one by one, so that the storage state of each record can be obtained, and all normal records can be put in storage.
The threshold value of the set quantity threshold is configurable, so that different users can change the threshold value according to the actual conditions of the database and the operation node resource, and a localization threshold value with highest efficiency is obtained.
If the number of records of the partial file is smaller than the set number threshold, after the file processing is finished, all the data (1 < = number of records < threshold, default 200) in the LIST is submitted and put in storage in batches no matter how many records are in the LIST. The method can avoid the reduction of the data warehouse-in aging caused by long-time waiting of too few files which cannot reach the threshold value.
The method of the invention further comprises:
s500: and generating DI information according to the warehousing state and sending the DI information to a corresponding DIEI processing Bolt, caching the DI information in the DIEI processing Bolt, and sending the DI information to the astronomical mirror in batches by the DEEI processing Bolt after the cached DI information reaches a set quantity or a set time period.
The DI information (monitoring information) of the traditional STORM processing framework is sent by adopting a rest interface single piece, and corresponding tcp connection needs to be established every time when the DI information is sent; each tcp connection requires a corresponding local port, and the range of local ports is limited, typically by default the system is 32768 to 60999, about 28000. When the minute data and the hour data of 7 ten thousand multi-site arrive in a centralized manner, the amount of data to be processed on a certain node easily exceeds the amount, and the DI information sent by using the rest interface is failed to be sent due to the insufficient amount of local ports.
In order to avoid the influence of monitoring information transmission on normal business processing flow, the invention designs an independent DIEI processing Bolt for monitoring information transmission and uses a batch and timing transmission mode. The warehouse-in interface returns the warehouse-in state of the record, the warehouse-in frame knows the warehouse-in result of the corresponding record according to the warehouse-in state, and regenerates the corresponding DI information, uses stack data structure to buffer DI information in DIEI processing Bolt, and default buffer 200 is submitted in batch, and the buffer record size is configurable. Meanwhile, a timing task is created in the DIEI processing Bolt, DI cache stack content is sent at fixed time, the default 3 seconds is triggered once, and the triggering time is configurable; the problem that DI information cannot reach the threshold value of the buffer memory for a long time and delay sending when the quantity is small is solved. And adding a failure retry function, such as DI batch transmission failure, repeating transmission three times, and if the DI batch transmission failure still occurs for the third time, recording DI information transmission errors in the log so as to find out failure reasons and not transmitting any more.
The invention increases the function of monitoring information transmission, realizes the whole-flow monitoring of data processing, and realizes seamless butt joint with the prior monitoring system astronomical mirror. The method successfully solves the problem of DI information loss caused by insufficient network transmission ports during mass data processing, reduces the delay of DI information transmission to the maximum extent, and realizes reasonable balance of stability and timeliness of a processing frame.
When the traditional Storm program is started, initializing and loading a local station information configuration Lua file to obtain station information, then generating a hashmap data structure, and enabling a processing board to quickly obtain administrative division, station level and other information of a certain station through the hashmap data structure, wherein the information is not contained in a data message. Because of static file loading, each time the information of the station is updated, the data file needs to be regenerated from the public metadata system, updated to all STORM nodes, all STORM topologies are restarted, the maintenance flow is complex, and the workload is large.
To solve the above problems, the method of the present invention further includes:
and directly acquiring the station information from the public metadata system through station information update processing felt, comparing the station information with the station information of the local file, and applying the changed station information to the Storm framework.
In the STORM processing framework of the present invention, a station information update processing Bolt is designed. And directly acquiring administrative division, station level and other information of domestic stations from the public metadata system, comparing the administrative division, station level and other information with the station information of the local file, and applying the changed information to the whole processing topology to update the station information without restarting. By triggering the update policy by the timing task, the timing update time can be modified by the configuration file.
Specifically, the station information update processing Bolt performs the following operation after the topology is started, and continuously updates the station information used by decoding and warehousing the Bolt.
(1) Judging whether the first loading identifier is True (default True), if True, loading the local station information file into a memory, and packaging the memory into a Map < Key, value > format. Simultaneously requesting a public metadata full interface, and marking a first loading mark as False after processing; if fast, then the common metadata delta interface is requested.
(2) After the metadata station and the station network total/increment interface information are obtained, the data format is packaged into a Map < Key, value > format, and after the packaging is finished, the data format is compared with the local station information in the Bolt memory of the current station information updating process. If the two are different, the metadata interface is used for acquiring the data, and the new station information is packaged into a new Map data structure. If the station information loaded locally does not exist in the station, the station information of the station is newly added to the new Map data structure.
(3) After comparing all stations and forming a new Map, the station update processing Bolt distributes the information to all decoding/warehousing bolts through a copy configuration technology. The copy-complete technique will ensure that each decoded/binned box receives the update information.
(4) When the decoding/warehousing Bolt acquires the Tuple of the Tuple, if the Bolt station information transmitted by the current Tuple is updated and processed by the Bolt, the Map information of the Bolt is compared with the Lua cache in the current decoding/warehousing Bolt memory. If the new station is added, updating the buffer memory and adding new station information. And if the station is the existing station, updating and replacing the metadata information of the corresponding station in the Map.
(5) And (3) after the background station information updating processing Bolt is updated once, repeating the steps (1) (2) (3) (4) for a certain time according to the configuration sleep, and realizing the updating without restarting at fixed time.
The following specific examples illustrate the effects of the present invention:
as shown in the figure, the Storm flow processing and the original multi-process comparison of the invention are adopted, and before 36 seconds, the Storm processing is the same as the original processing technology due to the small data volume; after 36 seconds, the data volume is greatly increased, and the Storm processing has obvious performance advantages compared with the original processing. For data between 36-47 seconds (about 2089 stations, about 85% of the ratio), the Storm processing framework is completed within 11 seconds, whereas the original multiprocessing only processed 359 stations of data in the same time, the Storm processing speed is about 5.8 times that of the original multiprocessing, as shown in fig. 8.
The embodiment of the invention provides a Storm-based mass meteorological observation data processing device, as shown in fig. 9, which comprises:
The data acquisition module 1 is used for acquiring message data from the RabbitMQ message queue by Spout of the Storm framework and converting the message data into a multiple.
The message data of the BUFR data message queue, the BUFR file message queue and the text file message queue are BUFR data message, BUFR file notification message and text file notification message respectively, and the BUFR file notification message and the text file notification message comprise respective four-level codes and full-path file names.
The first decoding and warehousing module 2 is used for transmitting the multiple converted by the BUFR data message to the first decoding Bolt and the first warehousing Bolt for decoding and warehousing.
And the second decoding and warehousing module 3 is used for transmitting the multiple converted by the BUFR file notification message to a second decoding Bolt, and the second decoding Bolt reads the content of the BUFR file from the shared file system according to the full path file name contained in the BUFR file notification message, decodes the content, and warehousing the content through the second warehousing Bolt after the decoding is finished.
And the third decoding and warehousing module 4 is used for transmitting the multiple converted from the text file notification message to a third decoding Bolt, and the third decoding Bolt reads the text file content from the shared file system according to the full path file name contained in the text file notification message, decodes the text file content, and performs warehousing through the third warehousing Bolt after the decoding is completed.
The invention designs a new STORM processing framework, can directly butt joint BUFR format information of RabbitMQ, reduces intermediate links from transmission to warehouse entry, realizes data processing once, each message and BUFR information is a complete, realizes whole flow from transmission to warehouse entry without floor processing, adopts a plurality of bolts to run in parallel, and greatly improves processing timeliness.
The first decoding Bolt and the first warehousing Bolt can be integrated first decoding warehousing bolts, the BUFR data message queues are divided into an hour BUFR data message queue and a minute BUFR data message queue, and the hour BUFR data message and the minute BUFR data message of the hour BUFR data message queue and the minute BUFR data message of the minute BUFR data message queue are converted into respective multiple through respective corresponding Spout and then sent to respective corresponding first decoding warehousing bolts to be decoded and warehoused respectively.
Further, the operation of first decoding the warehouse-in Bolt sequentially comprises: format checking, element decoding, missing test checking, time checking, characteristic value conversion, warehousing sql statement generation, correction report processing and data warehousing.
The BUFR data message sequentially comprises an indication section, an identification section, an optional section, a data description section, a data section and an end section.
The format check includes: judging whether the indication section starts with a BUFR character string; judging whether a field representing the whole length of the BUFR data message in the indication section accords with the actual whole length of the BUFR data message; judging whether the field representing the length of each segment accords with the actual length of each segment; performing byte checking calculation according to a data descriptor predefined in the data description section, and judging whether the calculated length is consistent with the actual length; if the above-mentioned judgement is passed, then the format check is passed, and the element decoding operation is implemented, otherwise, the format check is not passed, and the element decoding operation is not implemented.
The element decoding includes: and calling a corresponding decoding algorithm to perform element decoding, and obtaining metadata information comprising observation time, station numbers, stations, longitude and latitude and altitude and element values comprising temperature, air pressure, wind direction, wind speed, humidity and precipitation.
The missing test comprises: the missing element value is set as a preset missing value.
The time check includes: the data whose element-decoded observation time leads the current time by 24 hours and the data whose element-decoded observation time precedes the current time by 672 hours are discarded.
The eigenvalue transformation includes: the specific element value is set to a preset characteristic value.
The generating of the warehouse-in sql statement comprises the following steps: reading the set table name from the configuration file, and according to a pre-designed warehousing rule, realizing the correspondence between a warehousing field and an element value in a code to generate a warehousing sql statement; the in-store sql statement also includes management fields read from the configuration file and fields from system functions and attribute information.
The correction report processing comprises the following steps: when the parsed BBB item element is not '000', warehousing is only performed when the corresponding original BBB item element in the database is smaller than the parsed BBB item element.
The data warehouse entry comprises: and calling a corresponding warehousing interface to realize corresponding data warehousing.
The second decoding Bolt and the second warehousing Bolt can be integrated second decoding warehousing bolts, the BUFR file message queue is divided into an hour BUFR file message queue and a minute BUFR file message queue, and the hour BUFR file notification message and the minute BUFR file notification message of the hour BUFR file message queue and the minute BUFR file message queue are converted into respective multiple through respective corresponding Spout and then sent to respective corresponding second decoding warehousing bolts, and the respective BUFR file contents are decoded and warehoused respectively.
And the third warehouse-in Bolt comprises an hour warehouse-in Bolt and a minute warehouse-in Bolt.
Correspondingly, the third decoding and warehousing module comprises:
and the decoding unit is used for transmitting the multiple converted by the text file notification message to a third decoding Bolt, and the third decoding Bolt reads the text file content from the shared file system according to the full path file name contained in the text file notification message and decodes the text file content.
And the warehousing unit is used for sending the data with the minute of the observation time of 0 after decoding to the hour warehousing Bolt for warehousing, and sending the data with the minute of the observation time of not 0 after decoding to the minute warehousing Bolt for warehousing.
When in warehouse entry, the BUFR data information is subjected to warehouse entry in a mode of submitting the BUFR data information one by one.
And recording the decoded data into a cache list for the BUFR file and the text file, and submitting the data in the cache list in batches for storage after the record number of the data in the cache list reaches the set number or after the BUFR file and the text file are decoded.
And when the batch submission and warehousing fails, submitting the data in the cache list one by one for warehousing.
The device of the invention may further comprise:
the monitoring module is used for generating DI information according to the warehousing state and sending the DI information to the corresponding DIEI processing Bolt, caching the DI information in the DIEI processing Bolt, and sending the DI information to the astronomical mirror in batches after the cached DI information reaches the set quantity or the set time period.
And the station information updating processing module is used for directly acquiring the station information from the public metadata system through station information updating processing Bolt timing, comparing the station information with the station information of the local file and applying the changed station information to the Storm framework.
The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where no mention is made in the section of the device embodiment. It will be clear to those skilled in the art that, for convenience and brevity, the specific working procedures of the apparatus and units described above may refer to the corresponding procedures in the above method embodiments, and are not described herein again.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A Storm-based mass meteorological observation data processing method is characterized by comprising the following steps:
s100: the Spout of the Storm framework obtains message data from the RabbitMQ message queue and converts the message data into a multiple;
the message data of the BUFR data message queue, the BUFR file message queue and the text file message queue are BUFR data message, BUFR file notification message and text file notification message respectively, and the BUFR file notification message and the text file notification message comprise respective four-level codes and full-path file names;
s200: transmitting the multiple converted from the BUFR data message to a first decoding Bolt and a first warehousing Bolt, decoding and warehousing;
s300: transmitting the complete converted from the BUFR file notification message to a second decoding Bolt, wherein the second decoding Bolt reads the content of the BUFR file from the shared file system according to the full path file name contained in the BUFR file notification message, decodes the content, and stores the content through a second storage Bolt after decoding is completed;
s400: and transmitting the multiple converted from the text file notification message to a third decoding Bolt, wherein the third decoding Bolt reads text file contents from a shared file system according to the full-path file name contained in the text file notification message, decodes the text file contents, and stores the text file contents through a third storage Bolt after decoding.
2. The Storm-based mass meteorological observation data processing method according to claim 1, wherein the first decoding Bolt and the first warehousing Bolt are integrated first decoding warehousing bolts, the BUFR data message queue is divided into an hour BUFR data message queue and a minute BUFR data message queue, and the hour BUFR data message and the minute BUFR data message of the hour BUFR data message queue and the minute BUFR data message of the minute BUFR data message queue are converted into respective corresponding multiple through respective corresponding Spout and then sent to respective corresponding first decoding warehousing bolts for respective decoding and warehousing.
3. The Storm-based mass meteorological observation data processing method according to claim 2, wherein the operation of first decoding and warehousing Bolt sequentially comprises: format checking, element decoding, missing test checking, time checking, characteristic value conversion, warehousing sql statement generation, correction report processing and data warehousing.
4. A Storm-based mass meteorological observation data processing method as claimed in claim 3 wherein said BUFR data message comprises in sequence an indication section, an identification section, an optional section, a data description section, a data section and an end section;
The format check includes: judging whether the indication section starts with a BUFR character string or not; judging whether a field representing the whole length of the BUFR data message in the indication section accords with the actual whole length of the BUFR data message; judging whether the field representing the length of each segment accords with the actual length of each segment; performing byte checking calculation according to the predefined data descriptors in the data description section, and judging whether the calculated length is consistent with the actual length; if the judgment passes, the format check passes, the element decoding operation is executed, otherwise, the format check does not pass, and the element decoding operation is not executed;
and/or;
the element decoding includes: invoking a corresponding decoding algorithm to perform element decoding to obtain metadata information comprising observation time, station number, stations, longitude and latitude and altitude and element values comprising temperature, air pressure, wind direction, wind speed, humidity and precipitation;
and/or;
the missing test examination includes: setting the element value of the missing measurement as a preset missing measurement value;
and/or;
the time check includes: discarding the data of which the observation time after element decoding is 24 hours ahead of the current time and the data of which the observation time after element decoding is 672 hours ahead of the current time;
And/or;
the eigenvalue transformation includes: setting a specific element value as a preset characteristic value;
and/or;
the generating of the put-in sql statement comprises: reading the set table name from the configuration file, and according to a pre-designed warehousing rule, realizing the correspondence between a warehousing field and an element value in a code to generate a warehousing sql statement; the system comprises a configuration file, a system function and attribute information, wherein the warehousing sql statement also comprises a management field read from the configuration file and a field from the system function and the attribute information;
and/or;
the correction report processing includes: when the analyzed BBB item element is not '000', warehousing is only carried out when the corresponding original BBB item element in the database is smaller than the analyzed BBB item element;
and/or;
the data warehouse entry includes: and calling a corresponding warehousing interface to realize corresponding data warehousing.
5. The Storm-based mass meteorological observation data processing method according to any one of claims 1 to 4, wherein the second decoding Bolt and the second warehousing Bolt are integrated second decoding warehousing bolts, the BUFR file message queue is divided into an hour BUFR file message queue and a minute BUFR file message queue, and the hour BUFR file notification message and the minute BUFR file notification message of the hour BUFR file message queue and the minute BUFR file message queue are converted into respective corresponding special by respective corresponding Spout and then sent to respective corresponding second decoding warehousing bolts, and respective BUFR file contents are decoded and warehoused respectively.
6. The Storm-based mass meteorological observation data processing method according to claim 5, wherein the third warehouse-in Bolt comprises an hour warehouse-in Bolt and a minute warehouse-in Bolt;
the S400 includes:
s410: transmitting the multiple converted by the text file notification message to a third decoding Bolt, wherein the third decoding Bolt reads text file contents from a shared file system according to the full path file name contained in the text file notification message and decodes the text file contents;
s420: and sending the data with the observation time after decoding being 0 minutes to an hour warehouse-in Bolt for warehouse-in, and sending the data with the observation time after decoding being not 0 minutes to a minute warehouse-in Bolt for warehouse-in.
7. The Storm-based mass meteorological observation data processing method according to claim 6, wherein for the BUFR data message, warehousing is performed by submitting each item;
recording the decoded data into a cache list for the BUFR file and the text file, and submitting the data in the cache list in batches for storage after the record number of the data in the cache list reaches the set number or after the BUFR file and the text file are decoded;
and when the batch submission and warehousing fails, submitting the data in the cache list one by one for warehousing.
8. The Storm-based mass meteorological observation data processing method of claim 7, further comprising:
s500: and generating DI information according to the warehousing state and sending the DI information to a corresponding DIEI processing Bolt, caching the DI information in the DIEI processing Bolt, and sending the DI information to the astronomical mirror in batches by the DEEI processing Bolt after the cached DI information reaches a set quantity or a set time period.
9. The Storm-based mass meteorological observation data processing method of claim 8, further comprising:
and directly acquiring the station information from the public metadata system through station information update processing felt, comparing the station information with the station information of the local file, and applying the changed station information to the Storm framework.
10. A Storm-based mass meteorological observation data processing device, the device comprising:
the data acquisition module is used for acquiring message data from a RabbitMQ message queue by Spout of the Storm framework and converting the message data into a multiple;
the message data of the BUFR data message queue, the BUFR file message queue and the text file message queue are BUFR data message, BUFR file notification message and text file notification message respectively, and the BUFR file notification message and the text file notification message comprise respective four-level codes and full-path file names;
The first decoding and warehousing module is used for transmitting the multiple converted by the BUFR data message to a first decoding Bolt and a first warehousing Bolt for decoding and warehousing;
the second decoding and warehousing module is used for transmitting the multiple converted by the BUFR file notification message to a second decoding Bolt, and the second decoding Bolt reads the BUFR file content from the shared file system according to the full path file name contained in the BUFR file notification message, decodes the BUFR file content and performs warehousing through the second warehousing Bolt after the decoding is completed;
and the third decoding and warehousing module is used for transmitting the multiple converted from the text file notification message to a third decoding Bolt, and the third decoding Bolt reads the text file content from the shared file system according to the full path file name contained in the text file notification message, decodes the text file content, and warehousing the text file content through the third warehousing Bolt after the decoding is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211678249.9A CN116126552A (en) | 2022-12-26 | 2022-12-26 | Mass meteorological observation data processing method and device based on Storm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211678249.9A CN116126552A (en) | 2022-12-26 | 2022-12-26 | Mass meteorological observation data processing method and device based on Storm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116126552A true CN116126552A (en) | 2023-05-16 |
Family
ID=86307279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211678249.9A Pending CN116126552A (en) | 2022-12-26 | 2022-12-26 | Mass meteorological observation data processing method and device based on Storm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116126552A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117951205A (en) * | 2023-11-14 | 2024-04-30 | 国家气象信息中心(中国气象局气象数据中心) | GTS multi-format sounding message real-time conversion method and device |
CN118158289A (en) * | 2024-05-09 | 2024-06-07 | 国家气象信息中心(中国气象局气象数据中心) | Meteorological automatic station standard format data message transmission method, device and equipment |
CN118626879A (en) * | 2024-08-09 | 2024-09-10 | 国家气象信息中心(中国气象局气象数据中心) | BUFR data processing method, device and equipment based on template recognition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104092718A (en) * | 2013-12-12 | 2014-10-08 | 腾讯数码(天津)有限公司 | Distributed system and configuration information updating method in distributed system |
CN106897159A (en) * | 2017-01-20 | 2017-06-27 | 武汉华信联创技术工程有限公司 | A kind of system and method for gathering Data of Automatic Weather |
CN115220131A (en) * | 2022-06-23 | 2022-10-21 | 阿里巴巴(中国)有限公司 | Meteorological data quality inspection method and system |
-
2022
- 2022-12-26 CN CN202211678249.9A patent/CN116126552A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104092718A (en) * | 2013-12-12 | 2014-10-08 | 腾讯数码(天津)有限公司 | Distributed system and configuration information updating method in distributed system |
CN106897159A (en) * | 2017-01-20 | 2017-06-27 | 武汉华信联创技术工程有限公司 | A kind of system and method for gathering Data of Automatic Weather |
CN115220131A (en) * | 2022-06-23 | 2022-10-21 | 阿里巴巴(中国)有限公司 | Meteorological data quality inspection method and system |
Non-Patent Citations (2)
Title |
---|
冯勇 等: "流式计算技术在山东省非考核地面气象自动站数据实时处理中的应用", 数字技术与应用, vol. 40, no. 9, pages 1 - 5 * |
廖婷婷 等: "Storm流式技术在地面气象数据处理中的应用", 中低纬山地气象, vol. 43, no. 5, pages 78 - 81 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117951205A (en) * | 2023-11-14 | 2024-04-30 | 国家气象信息中心(中国气象局气象数据中心) | GTS multi-format sounding message real-time conversion method and device |
CN118158289A (en) * | 2024-05-09 | 2024-06-07 | 国家气象信息中心(中国气象局气象数据中心) | Meteorological automatic station standard format data message transmission method, device and equipment |
CN118158289B (en) * | 2024-05-09 | 2024-09-10 | 国家气象信息中心(中国气象局气象数据中心) | Meteorological automatic station standard format data message transmission method, device and equipment |
CN118626879A (en) * | 2024-08-09 | 2024-09-10 | 国家气象信息中心(中国气象局气象数据中心) | BUFR data processing method, device and equipment based on template recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116126552A (en) | Mass meteorological observation data processing method and device based on Storm | |
CN107506451B (en) | Abnormal information monitoring method and device for data interaction | |
CN110365644B (en) | Method for constructing high-performance monitoring platform of networking equipment | |
CN110430103B (en) | Message monitoring method | |
CN111970195B (en) | Data transmission method and streaming data transmission system | |
CN112751772B (en) | Data transmission method and system | |
CN110850452A (en) | Method, device and system for processing satellite telemetry data | |
CN112395339B (en) | Intersystem data admission verification method, device, computer equipment and storage medium | |
CN111125161A (en) | Real-time data processing method, device, equipment and storage medium | |
CN116760661A (en) | Data storage method, apparatus, computer device, storage medium, and program product | |
CN111241101A (en) | Distributed water conservancy RTU data acquisition system and method | |
CN112214500A (en) | Data comparison method and device, electronic equipment and storage medium | |
CN115348320A (en) | Communication data conversion method and device and electronic equipment | |
CN115695458A (en) | Data storage method of BS (base station) terminal under weak network environment | |
CN115473858A (en) | Data transmission method and streaming data transmission system | |
CN116542668A (en) | Block chain-based data processing method, equipment and readable storage medium | |
CN107169098B (en) | Data transfer method, data transfer device, and electronic apparatus | |
CN114205389B (en) | Information intelligent screening method and device based on Internet of Things | |
CN118170770B (en) | Data verification method and system | |
CN110928839A (en) | Storage method and system of international freight rate data | |
CN114584621B (en) | Data transmission method and device | |
CN115361032B (en) | Antenna unit for 5G communication | |
CN111143280B (en) | Data scheduling method, system, device and storage medium | |
CN114201541A (en) | Data extraction method, device, equipment and storage medium | |
CN116545506A (en) | Satellite ground application system service data processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230516 |
|
RJ01 | Rejection of invention patent application after publication |