CN113111140A - Method for rapidly analyzing multi-source marine business observation data - Google Patents

Method for rapidly analyzing multi-source marine business observation data Download PDF

Info

Publication number
CN113111140A
CN113111140A CN202110516907.3A CN202110516907A CN113111140A CN 113111140 A CN113111140 A CN 113111140A CN 202110516907 A CN202110516907 A CN 202110516907A CN 113111140 A CN113111140 A CN 113111140A
Authority
CN
China
Prior art keywords
analysis
data
rule
window
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110516907.3A
Other languages
Chinese (zh)
Inventor
梁建峰
宋晓
韩璐遥
郑兵
韦广昊
杨锦坤
杨扬
耿姗姗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NATIONAL MARINE DATA AND INFORMATION SERVICE
Original Assignee
NATIONAL MARINE DATA AND INFORMATION SERVICE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NATIONAL MARINE DATA AND INFORMATION SERVICE filed Critical NATIONAL MARINE DATA AND INFORMATION SERVICE
Priority to CN202110516907.3A priority Critical patent/CN113111140A/en
Publication of CN113111140A publication Critical patent/CN113111140A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for rapidly analyzing multi-source marine business observation data, which comprises the following steps of S1, receiving an original data file of the multi-source marine business observation data, classifying according to an intelligent identification rule, and respectively forwarding to different analysis windows of an analyzer according to a classification result by adopting a built multi-drive parallel rule window; s2, the analysis window analyzes the original data file according to the configuration information to obtain standard data; and S3, verifying the standard data file, and loading and storing the standard data file into a database in batches. The method solves the problem of analysis speed of the data processing technology in a specific application scene, and simultaneously adopts a complex configuration rule algorithm to complete data calculation and weight-removing quality control in a window so as to reduce the condition of consumption of calculation resources for data calculation and storage.

Description

Method for rapidly analyzing multi-source marine business observation data
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a method for rapidly analyzing multi-source marine business observation data.
Background
The marine environment observation is a core method for acquiring various marine environment factor data, and the diversification of marine environment observation platforms and systems leads to various types and different format types of business data acquired by marine observation instruments, which bring certain difficulty for the comprehensive analysis and utilization of marine environment observation business data. With the deepening of marine observation services, the progress of marine environment observation technology and the increase of equipment platforms in recent years, the appearance of larger and larger marine environment observation data amount, more and more marine observation data format styles and different storage formats of domestic and foreign data files provides higher challenges for the analysis and utilization of marine business observation data.
At present, data processing mostly focuses on data preprocessing technology and method, and the method is a general processing method for processing characteristics of acquired data such as missing values and repeated values, removing unique attributes, processing missing values, attribute coding, feature selection, principal component analysis and the like. The marine business observation data has the characteristics of multiple sources, multiple types, multiple formats, professionality and the like, and the existing processing method for the marine business data is mostly concentrated in the deep application of the traditional preprocessing method in the longitudinal subject, so that the following problems exist: the existing processing method adopts a sub-service processing strategy aiming at multi-source data, cannot realize the automatic identification of the multi-source processing requirement, and lacks the unified rapid processing capability. The existing processing method only aims at preprocessing contents such as duplicate removal and deletion, and can not meet the fusion requirement of complex data conversion algorithms in the professional field.
Disclosure of Invention
In view of this, the present invention is directed to provide a method for rapidly analyzing multi-source marine business observation data, so as to improve the analysis efficiency.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method for rapidly analyzing multi-source marine business observation data comprises the following steps:
s1, receiving original data files of multi-source marine business observation data, classifying according to intelligent identification rules, and respectively forwarding to different analysis windows of an analyzer according to classification results by adopting a built multi-drive parallel rule window;
s2, the analysis window analyzes the original data file according to the configuration information to obtain standard data;
and S3, verifying the standard data file, and loading and storing the standard data file into a database in batches.
Further, step S1 specifically includes the following steps:
s11, identifying the data type and source of the original data file, and configuring a label;
s12, classifying the original data files according to the labels through intelligent identification rules;
s13, dispatching a distribution link according to the classification result by adopting a multi-drive parallel rule window, and pushing the distribution link to an analysis window of an analyzer; the multi-drive parallel rule window schedules a multi-thread parallel consumption mode in the establishing process, realizes the integration of parallel technologies, realizes the scheduling of multiple drives and multiple windows and forms the scheduling capability of a parallel algorithm.
Further, in step S12, if it is determined that the original data file does not comply with the intelligent identification rule, the original file is backed up to the designated directory and an alarm is issued.
Further, in step S2, the configuration information includes an analysis rule configuration, a complex algorithm configuration, an algorithm, and an analysis priority rule configuration; wherein the content of the first and second substances,
the analysis rule configuration is to construct a corresponding analysis rule according to the particularity of the marine observation data to form an analysis rule set;
the complex algorithm configuration is to add the steps of algorithm analysis on the basis of the common analysis rule according to the analysis requirements of each service data.
Further, in step S3, an output trigger command is received by the output device, and whether the output trigger command meets the requirement of the verification standard is determined, and if the output trigger command meets the requirement of the standard, the output trigger command is stored in the database, and at the same time, an analysis window removing command is triggered, and an output success record is returned; if not, marking machine and returning to the analysis process.
Compared with the prior art, the method has the following advantages:
the method solves the problem of analysis speed of the data processing technology in a specific application scene, and simultaneously adopts a complex configuration rule algorithm to complete data calculation and weight-removing quality control in a window so as to reduce the condition of consumption of calculation resources for data calculation and storage.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of data forwarding rules and processing according to an embodiment of the present invention;
fig. 2 is an overall flowchart of data analysis according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a parser parsing process according to an embodiment of the present invention;
FIG. 4 is a flow chart of data output processing according to an embodiment of the present invention;
FIG. 5 is a sample diagram of an original data file according to an embodiment of the present invention;
fig. 6 is a sample diagram of a data file after parsing according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention discloses a method for rapidly analyzing multi-source marine business observation data, which comprises the following steps:
step 1, constructing a repeater for classifying and forwarding original data files of received multi-source marine business observation data;
the repeater receives an original data file of multi-source marine business observation data, classifies the original data file according to an intelligent identification rule, and forwards the original data file to different analysis windows of the analyzer according to a classification result by adopting a built multi-drive parallel rule window.
Therefore, before the original data file in the step 1 is distributed, a multi-drive parallel rule window is configured according to the characteristics of ocean business observation data, so that the forwarding and analyzing efficiency is improved;
the multi-drive parallel rule window is one of key technologies for improving the efficiency of the invention, and as long-time queuing is formed when marine business observation data file data is pushed to the parser processing window, serious congestion occurs in the parsing process along with the increase of marine business data flow. In addition, at the stage of analyzing the file by the analyzer, the analyzing and calculating process of the complex algorithm rule is also realized under the support of multiple concurrencies, and if the multiple-drive concurrency is not utilized, the influence of concurrency capability is received, so that the aim of high-efficiency and quick analysis cannot be achieved.
The multi-driver parallel rule window of the embodiment of the invention schedules a multi-thread parallel consumption mode in the establishing process, realizes the integration of parallel technologies, realizes the scheduling of multiple drivers and multiple windows and forms the scheduling capability of a parallel algorithm.
The transponder of the scheme is based on a distribution technology, and performs fusion of the distribution technology and an intelligent identification technology according to a special application scene in the field of marine business observation, as shown in fig. 1, based on configuration of a rule tag, the type and source of a received data file are intelligently identified, and an identification result is returned and recorded in a data tag table, so as to realize tracking of the file.
According to the setting of the intelligent identification rule, the file classification is automatically judged, the distribution link is scheduled, the marine business observation data file data is pushed to an analyzer to be processed in multiple windows, and a multi-window concurrent forwarding scheduling means is formed.
The intelligent identification rule is configured by carrying out rule configuration on the source and the type of the marine business observation data. Such as north sea buoy, east sea buoy source, message data format, Xml data format, etc.
And according to the combined configuration rule of the source and the type by intelligent identification, carrying out classification labeling to form a classification rule table, carrying out mapping matching with the received multi-source data attribute, automatically forming a classification link, and automatically forwarding the data file to a processing window of an analyzer.
If the original data file is judged not to accord with the intelligent identification rule, the original file is backed up to the appointed directory and an alarm is given.
Step 2, constructing an analyzer containing a complex configuration rule algorithm, and analyzing the original data file by an analysis window according to configuration information to obtain standard data;
the conventional overall process of data analysis is shown in fig. 2, and mainly converts an original data file into a standard data file and issues a file loading event. However, the construction of the resolver is used for solving the problem of the fusion of the existing preprocessing technology and a complex algorithm in the professional field. The complex configuration rule algorithm is used for realizing the data analysis requirement of a specific application scene of marine business observation data, realizing the rule configuration of the analysis algorithm of marine multi-dimensional data such as marine U component quality factors, longitude and latitude, space, time and the like, and realizing the fusion of a marine special analysis algorithm and an analysis technology.
The parser needs to transfer the multi-drive parallel rule window constructed in the step 1, and the data file pushed by the repeater in the step 1 is parsed by matching the multi-drive parallel rule window with the parsing rule. And pushing the files pushed by the forwarder according to classification rules by adopting a multi-drive parallel rule window, and scheduling analysis rules according to the classification in the step 1 at present to finish concurrent analysis.
The resolver needs to construct an algorithm configuration component, realizes the algorithm requirement of the complex configuration rule, completes the fusion of complex rule algorithms and models including lVal x sin (lDir), (LatDuFen-LatDu) x 60 and the like, and realizes the complex algorithm analysis rule configuration in the multi-dimensional professional fields of U component quality factors, longitude and latitude, space, time and the like in the analysis process.
The parser rule configuration includes a parsing rule configuration, a complex algorithm configuration, an algorithm, a parsing priority rule, and the like, as shown in fig. 3.
1) The analysis rule configuration is to construct a corresponding analysis rule according to the particularity of the marine observation data to form an analysis rule set, such as OSMAR-041, OSMAR-S ASSIC, volunteer punctual messages and other rules.
2) The complex algorithm configuration is to add a step of algorithm analysis on the basis of a common analysis rule according to the analysis requirement of each service data, and integrate complex rule algorithms such as lVal x sin (ldir), LatDuFen-LatDu x 60 and the like into calculation.
3) And performing standard processing such as duplicate removal, quality control and the like on the analyzed file.
Through the construction of the resolver algorithm configuration component, the resolver has the processing capacity of calculating data in a window and performing duplicate elimination quality control, the calculation algorithm is integrated into the resolver, and the processing flow refers to the attached drawing so as to reduce the problems that calculation resources are consumed by data calculation and storage and the like.
And 3, constructing an output device for rapidly outputting the standard file finished by the analyzer window.
The analyzed standard data file needs to be automatically output in batches through a constructed output device and loaded to a marine environment basic database.
And constructing an output device, receiving output trigger of the result value of the parallel analysis window, converting the analyzed data into a standard file, outputting the standard file in batches and automatically, landing the standard file, triggering a loading program, and warehousing the standard file.
As shown in fig. 4, the output device receives the output trigger command, checks the standard requirement, automatically falls to the ground if the standard requirement is met, and is also responsible for triggering the command of removing the analysis window and returning the output success record. If not, marking machine and returning to the analysis process.
Examples
The main objective of the invention is to realize the rapid analysis of multi-source marine business observation data, and then analyze the received south sea buoy observation file according to the analysis requirement of the marine business observation data to form a standard anchor buoy standard record format, so as to illustrate the specific analysis process by the example.
Step 1, forwarding of multi-source original observation file
After the repeater receives the south sea buoy minute message data and the buoy observation real-time data, the repeater automatically identifies the data type and the type characteristics, carries out link scheduling according to the number and the characteristics of the files, carries out forwarding according to the forwarding rule, and schedules an analyzer analysis window. Wherein the original data file is as shown in fig. 5.
Step 2, analyzing the multi-source observation file
And carrying out analysis rule mapping according to the rule of the transponder triggering analysis window, and carrying out complex algorithm calculation on the south sea buoy minute message data and the south sea buoy observation real-time data according to respective data file characteristics.
According to the requirement of the quality factor of the U component, calling an lVal x sin (lDir) algorithm, performing parallel calculation and pushing a calculation result,
the file rule processing is performed according to the processing rules such as the missing value and the quality conformity, and the efficient analysis result is formed in the parallel mode, and the data file sample after the analysis processing is shown in fig. 6, for example.
Step 3, outputting the standard file format
And loading, storing and exporting the successfully analyzed standard data file format (the standard record format of the anchor system buoy) for use.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. The method for rapidly analyzing the multi-source marine business observation data is characterized by comprising the following steps:
s1, receiving original data files of multi-source marine business observation data, classifying according to intelligent identification rules, and respectively forwarding to different analysis windows of an analyzer according to classification results by adopting a built multi-drive parallel rule window;
s2, the analysis window analyzes the original data file according to the configuration information to obtain a standard file;
and S3, verifying the standard data file, and loading and storing the standard data file into a database in batches.
2. The method of claim 1, wherein: the step S1 specifically includes the following steps:
s11, identifying the data type and source of the original data file, and configuring a label;
s12, classifying the original data files according to the labels through intelligent identification rules;
s13, dispatching a distribution link according to the classification result by adopting a multi-drive parallel rule window, and pushing the distribution link to an analysis window of an analyzer; wherein the content of the first and second substances,
the multi-drive parallel rule window schedules a multi-thread parallel consumption mode in the establishing process, realizes the integration of parallel technologies, realizes the scheduling of multiple drives and multiple windows and forms the scheduling capability of a parallel algorithm.
3. The method of claim 2, wherein: in step S12, if the original data file is determined not to comply with the intelligent recognition rule, the original file is backed up to the designated directory and an alarm is given.
4. The method of claim 1, wherein: in step S2, the configuration information includes parsing rule configuration, complex algorithm configuration, algorithm and parsed priority rule configuration; wherein the content of the first and second substances,
the analysis rule configuration is to construct a corresponding analysis rule according to the particularity of the marine observation data to form an analysis rule set;
the complex algorithm configuration is to add the steps of algorithm analysis on the basis of the common analysis rule according to the analysis requirements of each service data.
5. The method of claim 1, wherein: in step S3, the output trigger command is received by the output device, and whether the verification standard requirement is met is determined,
if the standard requirement is met, storing the data into a database, simultaneously triggering an analysis window removing command, and returning an output success record;
if not, marking machine and returning to the analysis process.
CN202110516907.3A 2021-05-12 2021-05-12 Method for rapidly analyzing multi-source marine business observation data Pending CN113111140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110516907.3A CN113111140A (en) 2021-05-12 2021-05-12 Method for rapidly analyzing multi-source marine business observation data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110516907.3A CN113111140A (en) 2021-05-12 2021-05-12 Method for rapidly analyzing multi-source marine business observation data

Publications (1)

Publication Number Publication Date
CN113111140A true CN113111140A (en) 2021-07-13

Family

ID=76722065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110516907.3A Pending CN113111140A (en) 2021-05-12 2021-05-12 Method for rapidly analyzing multi-source marine business observation data

Country Status (1)

Country Link
CN (1) CN113111140A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306574A (en) * 2023-04-10 2023-06-23 黄石宏付信息科技有限公司 Big data mining method and server applied to intelligent wind control task analysis
CN116303475A (en) * 2023-05-17 2023-06-23 吉奥时空信息技术股份有限公司 Management method and device for intelligent storage of multi-source index data
CN116628451A (en) * 2023-05-31 2023-08-22 江苏华存电子科技有限公司 High-speed analysis method for information to be processed

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030235194A1 (en) * 2002-06-04 2003-12-25 Mike Morrison Network processor with multiple multi-threaded packet-type specific engines
CN110716897A (en) * 2019-10-15 2020-01-21 北部湾大学 Cloud computing-based marine archive database parallelization construction method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030235194A1 (en) * 2002-06-04 2003-12-25 Mike Morrison Network processor with multiple multi-threaded packet-type specific engines
CN110716897A (en) * 2019-10-15 2020-01-21 北部湾大学 Cloud computing-based marine archive database parallelization construction method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘志杰等: "海洋底质标准化处理系统设计与开发", 《海洋信息》 *
宋晓等: "基于多架构混搭模式的极地海洋数据库建模技术研究", 《极地研究》 *
李彦等: "基于 XML 的海洋环境数据处理技术研究", 《海洋通报》 *
陈继香: "XML 在海洋数据服务领域的应用研究", 《海洋通报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306574A (en) * 2023-04-10 2023-06-23 黄石宏付信息科技有限公司 Big data mining method and server applied to intelligent wind control task analysis
CN116306574B (en) * 2023-04-10 2024-01-09 乌鲁木齐汇智兴业信息科技有限公司 Big data mining method and server applied to intelligent wind control task analysis
CN116303475A (en) * 2023-05-17 2023-06-23 吉奥时空信息技术股份有限公司 Management method and device for intelligent storage of multi-source index data
CN116303475B (en) * 2023-05-17 2023-08-08 吉奥时空信息技术股份有限公司 Management method and device for intelligent storage of multi-source index data
CN116628451A (en) * 2023-05-31 2023-08-22 江苏华存电子科技有限公司 High-speed analysis method for information to be processed
CN116628451B (en) * 2023-05-31 2023-11-14 江苏华存电子科技有限公司 High-speed analysis method for information to be processed

Similar Documents

Publication Publication Date Title
CN113111140A (en) Method for rapidly analyzing multi-source marine business observation data
CN105045820B (en) Method for processing video image information of high-level data and database system
CN110019218A (en) Data storage and querying method and equipment
CN109408746A (en) Portrait information query method, device, computer equipment and storage medium
CN102662988B (en) Method for filtering redundant data of RFID middleware
CN108228664B (en) Unstructured data processing method and device
CN102156799A (en) Cascadable complex event processing engine and train overhauling automatic recording method
CN114979309A (en) Method for supporting random access and processing of networked target data
CN111182577A (en) CDR synthesis monitoring system and method suitable for 5G road tester
CN115514784A (en) Multisource data acquisition middle platform based on Internet of things
CN112307318A (en) Content publishing method, system and device
CN108073705B (en) Distributed mass data aggregation acquisition method
CN116823155A (en) City event scheduling method and device, electronic equipment and storage medium
CN109508244B (en) Data processing method and computer readable medium
CN114528041A (en) Configurable automatic analysis method and device
CN212569771U (en) Trajectory big data feature extraction device
CN113779026A (en) Method and device for processing service data table
CN113641768A (en) Power grid multi-source data-based processing method, system and equipment
CN112434877A (en) Smart city data processing method and device based on cloud computing
CN110532071A (en) A kind of more application schedules system and method based on GPU
CN115080808B (en) Automobile data recorder information management method and system
CN113268363B (en) Global capability-based call tracking method, device, server and storage medium
CN116644039B (en) Automatic acquisition and analysis method for online capacity operation log based on big data
CN116974526A (en) Data development method, device, terminal equipment and storage medium
CN112000728B (en) Business data processing method, readable storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713