CN113111140A - Method for rapidly analyzing multi-source marine business observation data - Google Patents
Method for rapidly analyzing multi-source marine business observation data Download PDFInfo
- Publication number
- CN113111140A CN113111140A CN202110516907.3A CN202110516907A CN113111140A CN 113111140 A CN113111140 A CN 113111140A CN 202110516907 A CN202110516907 A CN 202110516907A CN 113111140 A CN113111140 A CN 113111140A
- Authority
- CN
- China
- Prior art keywords
- analysis
- data
- rule
- window
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000004458 analytical method Methods 0.000 claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 33
- 238000005516 engineering process Methods 0.000 claims abstract description 13
- 230000010354 integration Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 19
- 238000004364 calculation method Methods 0.000 abstract description 12
- 238000003908 quality control method Methods 0.000 abstract description 4
- 230000004927 fusion Effects 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for rapidly analyzing multi-source marine business observation data, which comprises the following steps of S1, receiving an original data file of the multi-source marine business observation data, classifying according to an intelligent identification rule, and respectively forwarding to different analysis windows of an analyzer according to a classification result by adopting a built multi-drive parallel rule window; s2, the analysis window analyzes the original data file according to the configuration information to obtain standard data; and S3, verifying the standard data file, and loading and storing the standard data file into a database in batches. The method solves the problem of analysis speed of the data processing technology in a specific application scene, and simultaneously adopts a complex configuration rule algorithm to complete data calculation and weight-removing quality control in a window so as to reduce the condition of consumption of calculation resources for data calculation and storage.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a method for rapidly analyzing multi-source marine business observation data.
Background
The marine environment observation is a core method for acquiring various marine environment factor data, and the diversification of marine environment observation platforms and systems leads to various types and different format types of business data acquired by marine observation instruments, which bring certain difficulty for the comprehensive analysis and utilization of marine environment observation business data. With the deepening of marine observation services, the progress of marine environment observation technology and the increase of equipment platforms in recent years, the appearance of larger and larger marine environment observation data amount, more and more marine observation data format styles and different storage formats of domestic and foreign data files provides higher challenges for the analysis and utilization of marine business observation data.
At present, data processing mostly focuses on data preprocessing technology and method, and the method is a general processing method for processing characteristics of acquired data such as missing values and repeated values, removing unique attributes, processing missing values, attribute coding, feature selection, principal component analysis and the like. The marine business observation data has the characteristics of multiple sources, multiple types, multiple formats, professionality and the like, and the existing processing method for the marine business data is mostly concentrated in the deep application of the traditional preprocessing method in the longitudinal subject, so that the following problems exist: the existing processing method adopts a sub-service processing strategy aiming at multi-source data, cannot realize the automatic identification of the multi-source processing requirement, and lacks the unified rapid processing capability. The existing processing method only aims at preprocessing contents such as duplicate removal and deletion, and can not meet the fusion requirement of complex data conversion algorithms in the professional field.
Disclosure of Invention
In view of this, the present invention is directed to provide a method for rapidly analyzing multi-source marine business observation data, so as to improve the analysis efficiency.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method for rapidly analyzing multi-source marine business observation data comprises the following steps:
s1, receiving original data files of multi-source marine business observation data, classifying according to intelligent identification rules, and respectively forwarding to different analysis windows of an analyzer according to classification results by adopting a built multi-drive parallel rule window;
s2, the analysis window analyzes the original data file according to the configuration information to obtain standard data;
and S3, verifying the standard data file, and loading and storing the standard data file into a database in batches.
Further, step S1 specifically includes the following steps:
s11, identifying the data type and source of the original data file, and configuring a label;
s12, classifying the original data files according to the labels through intelligent identification rules;
s13, dispatching a distribution link according to the classification result by adopting a multi-drive parallel rule window, and pushing the distribution link to an analysis window of an analyzer; the multi-drive parallel rule window schedules a multi-thread parallel consumption mode in the establishing process, realizes the integration of parallel technologies, realizes the scheduling of multiple drives and multiple windows and forms the scheduling capability of a parallel algorithm.
Further, in step S12, if it is determined that the original data file does not comply with the intelligent identification rule, the original file is backed up to the designated directory and an alarm is issued.
Further, in step S2, the configuration information includes an analysis rule configuration, a complex algorithm configuration, an algorithm, and an analysis priority rule configuration; wherein the content of the first and second substances,
the analysis rule configuration is to construct a corresponding analysis rule according to the particularity of the marine observation data to form an analysis rule set;
the complex algorithm configuration is to add the steps of algorithm analysis on the basis of the common analysis rule according to the analysis requirements of each service data.
Further, in step S3, an output trigger command is received by the output device, and whether the output trigger command meets the requirement of the verification standard is determined, and if the output trigger command meets the requirement of the standard, the output trigger command is stored in the database, and at the same time, an analysis window removing command is triggered, and an output success record is returned; if not, marking machine and returning to the analysis process.
Compared with the prior art, the method has the following advantages:
the method solves the problem of analysis speed of the data processing technology in a specific application scene, and simultaneously adopts a complex configuration rule algorithm to complete data calculation and weight-removing quality control in a window so as to reduce the condition of consumption of calculation resources for data calculation and storage.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of data forwarding rules and processing according to an embodiment of the present invention;
fig. 2 is an overall flowchart of data analysis according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a parser parsing process according to an embodiment of the present invention;
FIG. 4 is a flow chart of data output processing according to an embodiment of the present invention;
FIG. 5 is a sample diagram of an original data file according to an embodiment of the present invention;
fig. 6 is a sample diagram of a data file after parsing according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention discloses a method for rapidly analyzing multi-source marine business observation data, which comprises the following steps:
step 1, constructing a repeater for classifying and forwarding original data files of received multi-source marine business observation data;
the repeater receives an original data file of multi-source marine business observation data, classifies the original data file according to an intelligent identification rule, and forwards the original data file to different analysis windows of the analyzer according to a classification result by adopting a built multi-drive parallel rule window.
Therefore, before the original data file in the step 1 is distributed, a multi-drive parallel rule window is configured according to the characteristics of ocean business observation data, so that the forwarding and analyzing efficiency is improved;
the multi-drive parallel rule window is one of key technologies for improving the efficiency of the invention, and as long-time queuing is formed when marine business observation data file data is pushed to the parser processing window, serious congestion occurs in the parsing process along with the increase of marine business data flow. In addition, at the stage of analyzing the file by the analyzer, the analyzing and calculating process of the complex algorithm rule is also realized under the support of multiple concurrencies, and if the multiple-drive concurrency is not utilized, the influence of concurrency capability is received, so that the aim of high-efficiency and quick analysis cannot be achieved.
The multi-driver parallel rule window of the embodiment of the invention schedules a multi-thread parallel consumption mode in the establishing process, realizes the integration of parallel technologies, realizes the scheduling of multiple drivers and multiple windows and forms the scheduling capability of a parallel algorithm.
The transponder of the scheme is based on a distribution technology, and performs fusion of the distribution technology and an intelligent identification technology according to a special application scene in the field of marine business observation, as shown in fig. 1, based on configuration of a rule tag, the type and source of a received data file are intelligently identified, and an identification result is returned and recorded in a data tag table, so as to realize tracking of the file.
According to the setting of the intelligent identification rule, the file classification is automatically judged, the distribution link is scheduled, the marine business observation data file data is pushed to an analyzer to be processed in multiple windows, and a multi-window concurrent forwarding scheduling means is formed.
The intelligent identification rule is configured by carrying out rule configuration on the source and the type of the marine business observation data. Such as north sea buoy, east sea buoy source, message data format, Xml data format, etc.
And according to the combined configuration rule of the source and the type by intelligent identification, carrying out classification labeling to form a classification rule table, carrying out mapping matching with the received multi-source data attribute, automatically forming a classification link, and automatically forwarding the data file to a processing window of an analyzer.
If the original data file is judged not to accord with the intelligent identification rule, the original file is backed up to the appointed directory and an alarm is given.
Step 2, constructing an analyzer containing a complex configuration rule algorithm, and analyzing the original data file by an analysis window according to configuration information to obtain standard data;
the conventional overall process of data analysis is shown in fig. 2, and mainly converts an original data file into a standard data file and issues a file loading event. However, the construction of the resolver is used for solving the problem of the fusion of the existing preprocessing technology and a complex algorithm in the professional field. The complex configuration rule algorithm is used for realizing the data analysis requirement of a specific application scene of marine business observation data, realizing the rule configuration of the analysis algorithm of marine multi-dimensional data such as marine U component quality factors, longitude and latitude, space, time and the like, and realizing the fusion of a marine special analysis algorithm and an analysis technology.
The parser needs to transfer the multi-drive parallel rule window constructed in the step 1, and the data file pushed by the repeater in the step 1 is parsed by matching the multi-drive parallel rule window with the parsing rule. And pushing the files pushed by the forwarder according to classification rules by adopting a multi-drive parallel rule window, and scheduling analysis rules according to the classification in the step 1 at present to finish concurrent analysis.
The resolver needs to construct an algorithm configuration component, realizes the algorithm requirement of the complex configuration rule, completes the fusion of complex rule algorithms and models including lVal x sin (lDir), (LatDuFen-LatDu) x 60 and the like, and realizes the complex algorithm analysis rule configuration in the multi-dimensional professional fields of U component quality factors, longitude and latitude, space, time and the like in the analysis process.
The parser rule configuration includes a parsing rule configuration, a complex algorithm configuration, an algorithm, a parsing priority rule, and the like, as shown in fig. 3.
1) The analysis rule configuration is to construct a corresponding analysis rule according to the particularity of the marine observation data to form an analysis rule set, such as OSMAR-041, OSMAR-S ASSIC, volunteer punctual messages and other rules.
2) The complex algorithm configuration is to add a step of algorithm analysis on the basis of a common analysis rule according to the analysis requirement of each service data, and integrate complex rule algorithms such as lVal x sin (ldir), LatDuFen-LatDu x 60 and the like into calculation.
3) And performing standard processing such as duplicate removal, quality control and the like on the analyzed file.
Through the construction of the resolver algorithm configuration component, the resolver has the processing capacity of calculating data in a window and performing duplicate elimination quality control, the calculation algorithm is integrated into the resolver, and the processing flow refers to the attached drawing so as to reduce the problems that calculation resources are consumed by data calculation and storage and the like.
And 3, constructing an output device for rapidly outputting the standard file finished by the analyzer window.
The analyzed standard data file needs to be automatically output in batches through a constructed output device and loaded to a marine environment basic database.
And constructing an output device, receiving output trigger of the result value of the parallel analysis window, converting the analyzed data into a standard file, outputting the standard file in batches and automatically, landing the standard file, triggering a loading program, and warehousing the standard file.
As shown in fig. 4, the output device receives the output trigger command, checks the standard requirement, automatically falls to the ground if the standard requirement is met, and is also responsible for triggering the command of removing the analysis window and returning the output success record. If not, marking machine and returning to the analysis process.
Examples
The main objective of the invention is to realize the rapid analysis of multi-source marine business observation data, and then analyze the received south sea buoy observation file according to the analysis requirement of the marine business observation data to form a standard anchor buoy standard record format, so as to illustrate the specific analysis process by the example.
Step 1, forwarding of multi-source original observation file
After the repeater receives the south sea buoy minute message data and the buoy observation real-time data, the repeater automatically identifies the data type and the type characteristics, carries out link scheduling according to the number and the characteristics of the files, carries out forwarding according to the forwarding rule, and schedules an analyzer analysis window. Wherein the original data file is as shown in fig. 5.
Step 2, analyzing the multi-source observation file
And carrying out analysis rule mapping according to the rule of the transponder triggering analysis window, and carrying out complex algorithm calculation on the south sea buoy minute message data and the south sea buoy observation real-time data according to respective data file characteristics.
According to the requirement of the quality factor of the U component, calling an lVal x sin (lDir) algorithm, performing parallel calculation and pushing a calculation result,
the file rule processing is performed according to the processing rules such as the missing value and the quality conformity, and the efficient analysis result is formed in the parallel mode, and the data file sample after the analysis processing is shown in fig. 6, for example.
Step 3, outputting the standard file format
And loading, storing and exporting the successfully analyzed standard data file format (the standard record format of the anchor system buoy) for use.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (5)
1. The method for rapidly analyzing the multi-source marine business observation data is characterized by comprising the following steps:
s1, receiving original data files of multi-source marine business observation data, classifying according to intelligent identification rules, and respectively forwarding to different analysis windows of an analyzer according to classification results by adopting a built multi-drive parallel rule window;
s2, the analysis window analyzes the original data file according to the configuration information to obtain a standard file;
and S3, verifying the standard data file, and loading and storing the standard data file into a database in batches.
2. The method of claim 1, wherein: the step S1 specifically includes the following steps:
s11, identifying the data type and source of the original data file, and configuring a label;
s12, classifying the original data files according to the labels through intelligent identification rules;
s13, dispatching a distribution link according to the classification result by adopting a multi-drive parallel rule window, and pushing the distribution link to an analysis window of an analyzer; wherein the content of the first and second substances,
the multi-drive parallel rule window schedules a multi-thread parallel consumption mode in the establishing process, realizes the integration of parallel technologies, realizes the scheduling of multiple drives and multiple windows and forms the scheduling capability of a parallel algorithm.
3. The method of claim 2, wherein: in step S12, if the original data file is determined not to comply with the intelligent recognition rule, the original file is backed up to the designated directory and an alarm is given.
4. The method of claim 1, wherein: in step S2, the configuration information includes parsing rule configuration, complex algorithm configuration, algorithm and parsed priority rule configuration; wherein the content of the first and second substances,
the analysis rule configuration is to construct a corresponding analysis rule according to the particularity of the marine observation data to form an analysis rule set;
the complex algorithm configuration is to add the steps of algorithm analysis on the basis of the common analysis rule according to the analysis requirements of each service data.
5. The method of claim 1, wherein: in step S3, the output trigger command is received by the output device, and whether the verification standard requirement is met is determined,
if the standard requirement is met, storing the data into a database, simultaneously triggering an analysis window removing command, and returning an output success record;
if not, marking machine and returning to the analysis process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110516907.3A CN113111140A (en) | 2021-05-12 | 2021-05-12 | Method for rapidly analyzing multi-source marine business observation data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110516907.3A CN113111140A (en) | 2021-05-12 | 2021-05-12 | Method for rapidly analyzing multi-source marine business observation data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113111140A true CN113111140A (en) | 2021-07-13 |
Family
ID=76722065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110516907.3A Pending CN113111140A (en) | 2021-05-12 | 2021-05-12 | Method for rapidly analyzing multi-source marine business observation data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113111140A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306574A (en) * | 2023-04-10 | 2023-06-23 | 黄石宏付信息科技有限公司 | Big data mining method and server applied to intelligent wind control task analysis |
CN116303475A (en) * | 2023-05-17 | 2023-06-23 | 吉奥时空信息技术股份有限公司 | Management method and device for intelligent storage of multi-source index data |
CN116628451A (en) * | 2023-05-31 | 2023-08-22 | 江苏华存电子科技有限公司 | High-speed analysis method for information to be processed |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030235194A1 (en) * | 2002-06-04 | 2003-12-25 | Mike Morrison | Network processor with multiple multi-threaded packet-type specific engines |
CN110716897A (en) * | 2019-10-15 | 2020-01-21 | 北部湾大学 | Cloud computing-based marine archive database parallelization construction method and device |
-
2021
- 2021-05-12 CN CN202110516907.3A patent/CN113111140A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030235194A1 (en) * | 2002-06-04 | 2003-12-25 | Mike Morrison | Network processor with multiple multi-threaded packet-type specific engines |
CN110716897A (en) * | 2019-10-15 | 2020-01-21 | 北部湾大学 | Cloud computing-based marine archive database parallelization construction method and device |
Non-Patent Citations (4)
Title |
---|
刘志杰等: "海洋底质标准化处理系统设计与开发", 《海洋信息》 * |
宋晓等: "基于多架构混搭模式的极地海洋数据库建模技术研究", 《极地研究》 * |
李彦等: "基于 XML 的海洋环境数据处理技术研究", 《海洋通报》 * |
陈继香: "XML 在海洋数据服务领域的应用研究", 《海洋通报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306574A (en) * | 2023-04-10 | 2023-06-23 | 黄石宏付信息科技有限公司 | Big data mining method and server applied to intelligent wind control task analysis |
CN116306574B (en) * | 2023-04-10 | 2024-01-09 | 乌鲁木齐汇智兴业信息科技有限公司 | Big data mining method and server applied to intelligent wind control task analysis |
CN116303475A (en) * | 2023-05-17 | 2023-06-23 | 吉奥时空信息技术股份有限公司 | Management method and device for intelligent storage of multi-source index data |
CN116303475B (en) * | 2023-05-17 | 2023-08-08 | 吉奥时空信息技术股份有限公司 | Management method and device for intelligent storage of multi-source index data |
CN116628451A (en) * | 2023-05-31 | 2023-08-22 | 江苏华存电子科技有限公司 | High-speed analysis method for information to be processed |
CN116628451B (en) * | 2023-05-31 | 2023-11-14 | 江苏华存电子科技有限公司 | High-speed analysis method for information to be processed |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113111140A (en) | Method for rapidly analyzing multi-source marine business observation data | |
CN105045820B (en) | Method for processing video image information of high-level data and database system | |
CN110019218A (en) | Data storage and querying method and equipment | |
CN109408746A (en) | Portrait information query method, device, computer equipment and storage medium | |
CN102662988B (en) | Method for filtering redundant data of RFID middleware | |
CN108228664B (en) | Unstructured data processing method and device | |
CN102156799A (en) | Cascadable complex event processing engine and train overhauling automatic recording method | |
CN114979309A (en) | Method for supporting random access and processing of networked target data | |
CN111182577A (en) | CDR synthesis monitoring system and method suitable for 5G road tester | |
CN115514784A (en) | Multisource data acquisition middle platform based on Internet of things | |
CN112307318A (en) | Content publishing method, system and device | |
CN108073705B (en) | Distributed mass data aggregation acquisition method | |
CN116823155A (en) | City event scheduling method and device, electronic equipment and storage medium | |
CN109508244B (en) | Data processing method and computer readable medium | |
CN114528041A (en) | Configurable automatic analysis method and device | |
CN212569771U (en) | Trajectory big data feature extraction device | |
CN113779026A (en) | Method and device for processing service data table | |
CN113641768A (en) | Power grid multi-source data-based processing method, system and equipment | |
CN112434877A (en) | Smart city data processing method and device based on cloud computing | |
CN110532071A (en) | A kind of more application schedules system and method based on GPU | |
CN115080808B (en) | Automobile data recorder information management method and system | |
CN113268363B (en) | Global capability-based call tracking method, device, server and storage medium | |
CN116644039B (en) | Automatic acquisition and analysis method for online capacity operation log based on big data | |
CN116974526A (en) | Data development method, device, terminal equipment and storage medium | |
CN112000728B (en) | Business data processing method, readable storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210713 |